In this document, we build a logistic regression model of intervention decisions. This model is basically a psychometric function which estimates two parameters: 1. The point of subjective equality (PSE): The level of evidence at which users see the team as having equal utility with or without the new player. This parameter reflects bias in decision-making, either toward or away from intervening. 2. The just-noticeable difference (JND): The amount of evidence necessary to increase the user’s likelihood of intervening from 50% (at the PSE) to 75% (an arbitrary performance threshold). JNDs are inversely proportional to the slope of the linear model. They reflect the sensitivity of the user to evidence such that smaller JNDs suggest that users evaluate prospects more consistently and larger JNDs suggest a higher degree of noise in the perception of utility.

Load and Prepare Data

We load worker responses from our experiment and do some preprocessing.

# read in data 
full_df <- read_csv("experiment-anonymous.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   workerId = col_character(),
##   condition = col_character(),
##   start_means = col_logical(),
##   gender = col_character(),
##   age = col_character(),
##   education = col_character(),
##   chart_use = col_character(),
##   strategy_with_means = col_character(),
##   strategy_without_means = col_character(),
##   outcome = col_logical(),
##   trial = col_character(),
##   trialIdx = col_character()
## )
## See spec(...) for full column specifications.
# preprocessing
responses_df <- full_df %>%
  rename( # rename to convert away from camel case
    worker_id = workerId,
    ground_truth = groundTruth,
    sd_diff = sdDiff,
    p_award_with = pAwardWith,
    p_award_without = pAwardWithout,
    account_value = accountValue,
    p_superiority = pSup,
    start_time = startTime,
    resp_time = respTime,
    trial_dur = trialDur,
    trial_idx = trialIdx
  ) %>%
  # remove practice and mock trials from responses dataframe, leave in full version
  filter(trial_idx != "practice", trial_idx != "mock") %>% 
  # mutate rows where intervene == -1 for some reason
  mutate(
    intervene = if_else(intervene == -1,
                        # repair
                        if_else((payoff == (award_value - 1) | payoff == -1),
                                1, # payed for intervention
                                0), # didn't pay for intervention
                        # don't repair
                        as.numeric(intervene) # hack to avoid type error
                        )
  ) %>%
  # set up factors for modeling
  mutate(
    # add a variable to note whether the chart they viewed showed means
    means = as.factor((start_means & as.numeric(trial) <= (n_trials / 2)) | (!start_means & as.numeric(trial) > (n_trials / 2))),
    start_means = as.factor(start_means),
    sd_diff = as.factor(sd_diff),
    trial_number = as.numeric(trial)
  )

head(responses_df)
## # A tibble: 6 x 38
##   worker_id batch n_trials n_data_conds condition baseline es_threshold
##   <chr>     <dbl>    <dbl>        <dbl> <chr>        <dbl>        <dbl>
## 1 7819bfb6     17       34           18 intervals      0.5          0.9
## 2 7819bfb6     17       34           18 intervals      0.5          0.9
## 3 7819bfb6     17       34           18 intervals      0.5          0.9
## 4 7819bfb6     17       34           18 intervals      0.5          0.9
## 5 7819bfb6     17       34           18 intervals      0.5          0.9
## 6 7819bfb6     17       34           18 intervals      0.5          0.9
## # … with 31 more variables: start_means <fct>, award_value <dbl>,
## #   starting_value <dbl>, exchange <dbl>, cutoff <dbl>, max_bonus <dbl>,
## #   total_bonus <dbl>, duration <dbl>, numeracy <dbl>, gender <chr>, age <chr>,
## #   education <chr>, chart_use <chr>, strategy_with_means <chr>,
## #   strategy_without_means <chr>, account_value <dbl>, ground_truth <dbl>,
## #   intervene <dbl>, outcome <lgl>, p_award_with <dbl>, p_award_without <dbl>,
## #   p_superiority <dbl>, payoff <dbl>, resp_time <dbl>, sd_diff <fct>,
## #   start_time <dbl>, trial <chr>, trial_dur <dbl>, trial_idx <chr>,
## #   means <fct>, trial_number <dbl>

We need the data in a format where it is prepared for modeling. This means we want to calculate a scale of evidence in favor of intervention. We calculate this by apply a log odds transform to our utility optimal decision rule, transforming our evidence from differences of probabilities into log odds units consistent with the idea that people perceive proabilities as log odds.

model_df <- responses_df %>%
  mutate(
    # evidence scale
    p_diff = p_award_with - (p_award_without + (1 / award_value)),
    evidence = qlogis(p_award_with) - qlogis(p_award_without + (1 / award_value)),
    # scale and center trial order
    trial = (trial_number - as.numeric(n_trials) / 2) / as.numeric(n_trials)
  )

model_df %>%
  ggplot(aes(p_diff, evidence)) +
  geom_line() +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  geom_hline(yintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") +
  theme_bw() +
  labs(
    x = "Evidence in terms of probability",
    y = "Evidence in terms of log odds"
  )

Let’s look at the distribution of levels of evidence sampled on this scale. It should look approximately uniform.

model_df %>% ggplot(aes(x = evidence)) +
  geom_histogram(fill = "black", binwidth = 0.25) +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  xlim(quantile(model_df$evidence, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank())
## Warning: Removed 2 rows containing missing values (geom_bar).

Now, lets apply our exclusion criteria, cutting our sample down to only the subset of participants who passed both attention checks.

# determine exclusions
exclude_df <- model_df %>% 
  # attention check trials where ground truth = c(0.5, 0.999)
  mutate(failed_check = (ground_truth == 0.5 & intervene != 0) | (ground_truth == 0.999 & intervene != 1)) %>%
  group_by(worker_id) %>%
  summarise(
    failed_attention_checks = sum(failed_check),
    unique_p_sup = length(unique(p_superiority)),
    # excluded if they failed either attention check or used fewer than three levels of the response scale
    exclude = failed_attention_checks > 0 | unique_p_sup < 3
  ) %>% 
  dplyr::select(worker_id, exclude)

# apply exclusion criteria and remove attention check trials from modeling data set
model_df <- model_df %>% 
  left_join(exclude_df, by = "worker_id") %>% 
  filter(exclude == FALSE) %>%
  filter(ground_truth > 0.5 & ground_truth < 0.999)

# how many remaining workers per condition?
model_df %>%
  group_by(condition, start_means) %>% # between subject manipulations
  summarise(
    n_workers = length(unique(worker_id))
  )
## # A tibble: 8 x 3
## # Groups:   condition [4]
##   condition start_means n_workers
##   <chr>     <fct>           <int>
## 1 densities FALSE              79
## 2 densities TRUE               78
## 3 HOPs      FALSE              79
## 4 HOPs      TRUE               76
## 5 intervals FALSE              80
## 6 intervals TRUE               80
## 7 QDPs      FALSE              77
## 8 QDPs      TRUE               77

In addition to excluding participants who failed at least one of the two attention checks in the experiment, which is our preregistered exclusion criterion, we also exclude a handful of workers whose data lead to model fit issues. These are workers who responded with only one or two levels of the probability of superiority scale. We could make the case that these workers might not have been trying very hard when responding, but the reason for excluding them is much more practical: It is not possible for the modeling process we are using to estimate random effects on response variability for these participants (i.e., you cannot calculate the variance of a set with only one or two distinct values). These random effects on variance are very important, because our data almost certaintly violate a homogeneity of variance assumption.

Because of these exclusions, we are a few participants short of our target samples size of 80. We should still have more than enough data to support statistical inferences. Here we drop a handful of additional participants to maintain counterbalancing of block order. Since we know that there were some participants with dropped responses, let’s prioritize leaving out workers with the greatest number of dropped trials in each counterbalancing condition.

model_df %>%
  group_by(condition, start_means, worker_id) %>%
  summarise(
    n_trials = n(),
    dropped_trials = 32 - n_trials
  ) %>%
  filter(dropped_trials > 0)
## # A tibble: 5 x 5
## # Groups:   condition, start_means [3]
##   condition start_means worker_id n_trials dropped_trials
##   <chr>     <fct>       <chr>        <int>          <dbl>
## 1 densities TRUE        e4b46997        24              8
## 2 HOPs      FALSE       c488db75         5             27
## 3 HOPs      FALSE       ce016e09        25              7
## 4 HOPs      FALSE       f430e2e8        28              4
## 5 intervals FALSE       ff8a2a69        28              4

Based on a comparison of the two tables above, we’ll drop workers c488db75, ce016e09, f430e2e8, and c337674a to ensure our ability to fit our model and to maintain counterbalancing of block order.

# remove workers with missing data, plus one where condition = densities, start_means = FALSE
model_df <- model_df %>%
filter(!worker_id %in% c("c488db75", "ce016e09", "f430e2e8", "c337674a"))

model_df %>%
  group_by(condition, start_means) %>% # between subject manipulations
  summarise(
    n_workers = length(unique(worker_id))
  )
## # A tibble: 8 x 3
## # Groups:   condition [4]
##   condition start_means n_workers
##   <chr>     <fct>           <int>
## 1 densities FALSE              78
## 2 densities TRUE               78
## 3 HOPs      FALSE              76
## 4 HOPs      TRUE               76
## 5 intervals FALSE              80
## 6 intervals TRUE               80
## 7 QDPs      FALSE              77
## 8 QDPs      TRUE               77

Now we have our dataset ready for modeling.

Distribution of Decisions

We start as simply as possible by just modeling the distribution of decisions using a logit link function and an intercept model. Here we are just estimating the mean probability of intervening.

Before we fit the model to our data, let’s check that our priors seem reasonable. We’ll set a pretty wide prior on the intercept parameter and locate it at zero since this reflects a weakly informative expectation of no bias.

# get_prior(data = model_df, 
#           family = bernoulli(link = "logit"),
#           formula = bf(intervene ~ 1))

# starting as simple as possible: learn the distribution of decisions
prior.logistic_intercept <- brm(data = model_df, family = bernoulli(link = "logit"),
                                formula = bf(intervene ~ 1),
                                prior = c(prior(normal(0, 1), class = Intercept)),
                                sample_prior = "only",
                                iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling

Let’s look at our prior predictive distribution. This should just be an even split between 0 and 1.

# prior predictive check
model_df %>%
  select(-intervene) %>%
  add_predicted_draws(prior.logistic_intercept, prediction = "intervene", seed = 1234) %>%
  ggplot(aes(x = intervene)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Prior predictive distribution for intervention decisions") +
  theme(panel.grid = element_blank())

Now, let’s fit the model to our data.

# starting as simple as possible: learn the distribution of decisions
m.logistic_intercept <- brm(data = model_df, family = bernoulli(link = "logit"),
              formula = bf(intervene ~ 1),
              prior = c(prior(normal(0, 1), class = Intercept)),
              iter = 3000, warmup = 500, chains = 2, cores = 2,
              file = "model-fits/logistic_intercept_mdl")

Check diagnostics:

# trace plots
plot(m.logistic_intercept)

# model summary
print(m.logistic_intercept)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ 1 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.36      0.01     0.33     0.39 1.00     1761     2440
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check out a posterior predictive distribution for intervention decisions.

# posterior predictive check
model_df %>%
  select(-intervene) %>% # this model should not be sensitive to evidence
  add_predicted_draws(m.logistic_intercept, prediction = "intervene", seed = 1234, n = 500) %>%
  ggplot(aes(x = intervene)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for intervention") +
  theme(panel.grid = element_blank())

The posterior predictive distribution is about what we’d expect. The slight bias toward intervening is consistent with a positive intercept parameter.

How do the posterior predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = intervene)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for intervention") +
  theme(panel.grid = element_blank())

Since these look pretty good, even for an intercept model, it seems like other predictive checks will be a better indicator of fit going forward.

Let’s take a look at the estimated psychometric function. This should not have any slope. It is just an estimate of what proportion of the time people intervene.

model_df %>%
  select(evidence, intervene) %>%
  add_fitted_draws(m.logistic_intercept, value = "pf", n = 500) %>%
  ggplot(aes(x = evidence, y = intervene)) +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  # geom_line(aes(y = pf, group = .draw)) +
  stat_lineribbon(aes(y = pf), .width = c(.95, .80, .50), alpha = .25, fill = "black") +
  geom_point(alpha = .15, color = "black") +
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  theme_bw() +
  theme(panel.grid.minor = element_blank())

Add Hierarchy for Slopes and Intercepts

The models we’ve created thus far fail to account for much of the noise in the data. Here, we attempt to parse some heterogeniety in responses by modeling a random effect of worker on slopes and intercepts. This introduces a hierarchical component to our model in order to account for individual differences in the best fitting linear model for each worker’s data.

Before we fit our model let’s check that our priors seem reasonable. We add priors for the standard deviation of random slopes and intercepts per worker and their correlation. The priors for random effects are moderately weak, and the prior for their correlation is meant to avoid extreme correlations that could mess up the model fit.

# get_prior(data = model_df, family = bernoulli(link = "logit"), formula = bf(intervene ~ (1 + evidence|worker_id) + evidence))

# starting as simple as possible: learn the distribution of decisions
prior.wrkr.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
                           formula = bf(intervene ~ (1 + evidence|worker_id) + evidence),
                           prior = c(prior(normal(0, 1), class = Intercept),
                                     prior(normal(1, 1), class = b),
                                     prior(normal(0, 0.5), class = sd),
                                     prior(lkj(4), class = cor)),
                           sample_prior = "only",
                           iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling

Let’s look at predicted model fits to see the space of possible models predicted by our priors.

# prior predictive check
model_df %>%
  select(evidence, worker_id) %>%
  add_fitted_draws(prior.wrkr.logistic, value = "pf", seed = 1234, n = 500) %>%
  ggplot(aes(x = evidence, y = pf)) +
  stat_lineribbon(.width = c(.95, .80, .50), alpha = .25, fill = "black") +
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  labs(subtitle = "Prior predictive PF fit") +
  theme(panel.grid = element_blank())

# hierarchical linear model with logit link
m.wrkr.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
                       formula = bf(intervene ~ (1 + evidence|worker_id) + evidence),
                       prior = c(prior(normal(0, 1), class = Intercept),
                                 prior(normal(1, 1), class = b),
                                 prior(normal(0, 0.5), class = sd),
                                 prior(lkj(4), class = cor)),
                       iter = 3000, warmup = 500, chains = 2, cores = 2,
                       file = "model-fits/logistic_mdl-wrkr")

Check diagnostics:

# trace plots
plot(m.wrkr.logistic)

# pairs plot
pairs(m.wrkr.logistic)

# model summary
print(m.wrkr.logistic)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence | worker_id) + evidence 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)               1.29      0.05     1.20     1.39 1.00     1686
## sd(evidence)                0.95      0.05     0.86     1.05 1.00     1561
## cor(Intercept,evidence)     0.50      0.04     0.41     0.58 1.00     1960
##                         Tail_ESS
## sd(Intercept)               3186
## sd(evidence)                2532
## cor(Intercept,evidence)     2969
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.42      0.06     0.31     0.54 1.00     1482     2807
## evidence      1.51      0.05     1.41     1.61 1.00     2215     2966
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s take a look at the estimated psychometric functions for each worker. When we make this kind of plot for model checks at the level of individual workers, we’ll look at a subset of workers to keep the number of charts generated to a reasonable number.

# two workers from each counterbalancing condition
model_check_set <- model_df %>% 
  group_by(start_means, condition, worker_id) %>%
  summarise() %>%
  top_n(2)
## Selecting by worker_id
model_check_set <- model_check_set$worker_id
model_check_df <- model_df %>%
  filter(worker_id %in% model_check_set)

model_check_df %>% 
  group_by(worker_id) %>%
  summarise()
## # A tibble: 16 x 1
##    worker_id
##    <chr>    
##  1 f27ed3b6 
##  2 f4f534e0 
##  3 f5d48035 
##  4 f796f54d 
##  5 f7f69f44 
##  6 f83e2827 
##  7 fa0f4b94 
##  8 fa22b8bb 
##  9 fba3405d 
## 10 fccb21d5 
## 11 fd15ec30 
## 12 fd3bea1b 
## 13 fdb8555e 
## 14 fe8936cd 
## 15 fee45dce 
## 16 ff8a2a69
model_check_df %>%
  group_by(evidence, worker_id) %>%
  add_fitted_draws(m.wrkr.logistic, value = "pf", n = 200) %>%
  ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
  geom_point(alpha = .15) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) +
  facet_wrap(. ~ worker_id)

These fits are looking better. Let’s try expanding our model to include the manipulations in the experiment that we need to acccount for to answer our research questions.

Add Predictors to Answer Research Questions

Presence/Absence of the Mean

We want to know whether adding means to visualizations biases the point of subjective equality or changes sensitivity to utility on average. We model these effects as a fixed intercept and a fixed slope (i.e., an interaction with the level of evidence).

Before we fit our model let’s check that our priors seem reasonable. Now that we are modeling multiple slopes using dummy coding, we need to differentiate between the baseline slope when there are no means (for which we use the same prior as before) and the effect of means on slopes (a difference from baseline which should be located at 0 and have a moderately weak prior).

# get_prior(data = model_df, family = bernoulli(link = "logit"), formula = bf(intervene ~ (1 + evidence|worker_id) + evidence*means))

# starting as simple as possible: learn the distribution of decisions
prior.wrkr.means.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
                                 formula = bf(intervene ~ (1 + evidence|worker_id) + evidence* means),
                                 prior = c(prior(normal(0, 1), class = Intercept),
                                           prior(normal(1, 1), class = b, coef = evidence),
                                           prior(normal(0, 0.5), class = b),
                                           prior(normal(0, 0.5), class = sd),
                                           prior(lkj(4), class = cor)),
                                 sample_prior = "only",
                                 iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling

Let’s look at predicted model fits to see the space of possible models predicted by our priors.

# prior predictive check
model_df %>%
  select(evidence, worker_id, means) %>%
  add_fitted_draws(prior.wrkr.means.logistic, value = "pf", seed = 1234, n = 500) %>%
  ggplot(aes(x = evidence, y = pf)) +
  stat_lineribbon(.width = c(.95, .80, .50), alpha = .25, fill = "black") +
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  labs(subtitle = "Prior predictive PF fit") +
  theme(panel.grid = element_blank())

# hierarchical linear model with logit link and predictors for the presence/absence of means
m.wrkr.means.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
                             formula = bf(intervene ~ (1 + evidence|worker_id) + evidence* means),
                             prior = c(prior(normal(0, 1), class = Intercept),
                                       prior(normal(1, 1), class = b, coef = evidence),
                                       prior(normal(0, 0.5), class = b),
                                       prior(normal(0, 0.5), class = sd),
                                       prior(lkj(4), class = cor)),
                             iter = 3000, warmup = 500, chains = 2, cores = 2,
                             file = "model-fits/logistic_mdl-wrkr_means")

Check diagnostics:

  • Trace plots
# trace plots
plot(m.wrkr.means.logistic)

  • Pairs plot
# pairs plot
pairs(m.wrkr.means.logistic)

  • Summary
# model summary
print(m.wrkr.means.logistic)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence | worker_id) + evidence * means 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)               1.29      0.05     1.20     1.39 1.00     1847
## sd(evidence)                0.95      0.05     0.86     1.05 1.00     1461
## cor(Intercept,evidence)     0.50      0.04     0.41     0.58 1.00     1932
##                         Tail_ESS
## sd(Intercept)               2805
## sd(evidence)                2271
## cor(Intercept,evidence)     2746
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept              0.43      0.06     0.32     0.55 1.00     1617     2701
## evidence               1.57      0.05     1.46     1.67 1.00     2245     3290
## meansTRUE             -0.02      0.04    -0.09     0.05 1.00    10552     3430
## evidence:meansTRUE    -0.11      0.03    -0.17    -0.04 1.00    11413     3823
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s take a look at the estimated psychometric functions for each worker.

model_check_df %>%
  group_by(evidence, worker_id, means) %>%
  add_fitted_draws(m.wrkr.means.logistic, value = "pf", n = 200) %>%
  ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
  geom_point(alpha = .15) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) +
  facet_wrap(. ~ worker_id)

It looks like there may be some differences per visualization condition that our model is not capturing.

What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like when means are present vs absent? Since we are probing conditional expectations, we’ll forego calculating maringal effects by manually combining parameters. Instead we’ll use add_fitted_draws and compare_levels to get slopes and intercepts. Then we’ll use these to derive estimates of the JND and PSE for each fitted draw.

# get slopes from linear model
slopes_df <- model_df %>%
  group_by(means) %>%
  data_grid(evidence = c(0, 1)) %>%
  add_fitted_draws(m.wrkr.means.logistic, re_formula = NA, scale = "linear") %>%
  compare_levels(.value, by = evidence) %>%
  rename(slope = .value)

# get intercepts from linear model
intercepts_df <- model_df %>%
  group_by(means) %>%
  data_grid(evidence = 0) %>% 
  add_fitted_draws(m.wrkr.means.logistic, re_formula = NA, scale = "linear") %>%
  rename(intercept = .value) 

# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>% 
  full_join(intercepts_df, by = c("means", ".draw")) %>%
  mutate(
    pse = -intercept / slope,
    jnd = qlogis(0.75) / slope
  )

First, let’s look at the estimates of JNDs per condition.

stats_df %>%
  # compare_levels(jnd, by = means) %>%
  # mutate(jnd_p_award = exp(jnd) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(jnd)) - unique(model_df$baseline) - 1 / unique(model_df$award_value)) %>%
  ggplot(aes(x = jnd, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  # ggplot(aes(x = jnd_p_award)) +
  # geom_density(alpha = 0.35, fill = "black") +
  scale_x_continuous(expression(jnd), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior JND per condition") +
  # labs(subtitle = "Posterior difference in JNDs with means present vs absent") +
  theme(panel.grid = element_blank())

It looks like users are similarly sensitive to evidence whether means are present or absent, ignoring the effect of other manipulations.

Next, we’ll look at the point of subjective equality in each condition.

stats_df %>%
  ggplot(aes(x = pse, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(pse), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior PSE per condition") +
  theme(panel.grid = element_blank())

It looks like the point of subjective equality less biased when means are present, ignoring the impact of other manipulations.

Uncertainty Shown

We expect that the level of uncertainty shown will impact judgments of effect size. Does it similarly effect the impact of means on decision-making? We test this by adding sd_diff to the interaction term in our model specification.

We’ll use the same priors as in our previous model, so we’ll jump right to fitting the model.

# hierarchical linear model with logit link and predictors for the presence/absence of means, level of uncertainty shown
m.wrkr.means.sd.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
                                formula = bf(intervene ~ (1 + evidence|worker_id) + evidence*means*sd_diff),
                                prior = c(prior(normal(0, 1), class = Intercept),
                                          prior(normal(1, 1), class = b, coef = evidence),
                                          prior(normal(0, 0.5), class = b),
                                          prior(normal(0, 0.5), class = sd),
                                          prior(lkj(4), class = cor)),
                                iter = 3000, warmup = 500, chains = 2, cores = 2,
                                file = "model-fits/logistic_mdl-wrkr_means_sd")

Check diagnostics:

  • Trace plots
# trace plots
plot(m.wrkr.means.sd.logistic)

  • Pairs plot
# pairs plot
pairs(m.wrkr.means.sd.logistic, pars = c("b_"))

# pairs plot
pairs(m.wrkr.means.sd.logistic, pars = c("sd_worker_id", "cor_worker_id"))

  • Summary
print(m.wrkr.means.sd.logistic)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence | worker_id) + evidence * means * sd_diff 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)               1.41      0.05     1.31     1.51 1.00     1675
## sd(evidence)                1.04      0.05     0.95     1.14 1.00     1492
## cor(Intercept,evidence)     0.50      0.04     0.42     0.58 1.00     1609
##                         Tail_ESS
## sd(Intercept)               2845
## sd(evidence)                2592
## cor(Intercept,evidence)     2706
## 
## Population-Level Effects: 
##                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                       -0.06      0.07    -0.20     0.07 1.00     1290
## evidence                         1.48      0.06     1.36     1.60 1.00     1910
## meansTRUE                       -0.17      0.05    -0.27    -0.07 1.00     5682
## sd_diff15                        1.16      0.06     1.05     1.27 1.00     5670
## evidence:meansTRUE              -0.04      0.05    -0.13     0.05 1.00     5560
## evidence:sd_diff15               0.54      0.05     0.43     0.64 1.00     4891
## meansTRUE:sd_diff15              0.27      0.08     0.12     0.42 1.00     5093
## evidence:meansTRUE:sd_diff15    -0.17      0.07    -0.31    -0.02 1.00     4708
##                              Tail_ESS
## Intercept                        2195
## evidence                         3237
## meansTRUE                        3922
## sd_diff15                        4126
## evidence:meansTRUE               3972
## evidence:sd_diff15               4067
## meansTRUE:sd_diff15              3732
## evidence:meansTRUE:sd_diff15     3746
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s take a look at the estimated psychometric functions for each worker.

model_check_df %>%
  group_by(evidence, worker_id, means, sd_diff) %>%
  add_fitted_draws(m.wrkr.means.sd.logistic, value = "pf", n = 200) %>%
  ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
  geom_point(alpha = .15) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) +
  facet_wrap(. ~ worker_id)

It looks like there may be some differences per visualization condition that our model is not capturing.

What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like when means are present vs absent at diffent levels of uncertainty?

# get slopes from linear model
slopes_df <- model_df %>%
  group_by(means, sd_diff) %>%
  data_grid(evidence = c(0, 1)) %>%
  add_fitted_draws(m.wrkr.means.sd.logistic, re_formula = NA, scale = "linear") %>%
  compare_levels(.value, by = evidence) %>%
  rename(slope = .value)

# get intercepts from linear model
intercepts_df <- model_df %>%
  group_by(means, sd_diff) %>%
  data_grid(evidence = 0) %>% 
  add_fitted_draws(m.wrkr.means.sd.logistic, re_formula = NA, scale = "linear") %>%
  rename(intercept = .value) 

# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>% 
  full_join(intercepts_df, by = c("means", "sd_diff", ".draw")) %>%
  mutate(
    pse = -intercept / slope,
    jnd = qlogis(0.75) / slope
  )

First, let’s look at the estimates of JNDs per condition.

stats_df %>%
  # compare_levels(jnd, by = means) %>%
  # mutate(jnd_p_award = exp(jnd) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(jnd)) - unique(model_df$baseline) - 1 / unique(model_df$award_value)) %>%
  ggplot(aes(x = jnd, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  # ggplot(aes(x = jnd_p_award)) +
  # geom_density(alpha = 0.35, fill = "black") +
  scale_x_continuous(expression(jnd), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior JND per condition") +
  # labs(subtitle = "Posterior difference in JNDs with means present vs absent") +
  theme(panel.grid = element_blank()) +
  facet_grid(. ~ sd_diff)

It looks like means improve sensitivity to evidence when mean are present and uncertainty is high, ignoring the effect of other manipulations. This is consistent with the debiasing effect of the mean on effect size judgments when uncertainty is high.

Next, we’ll look at the point of subjective equality in each condition.

stats_df %>%
  ggplot(aes(x = pse, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(pse), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior PSE per condition") +
  theme(panel.grid = element_blank()) +
  facet_grid(. ~ sd_diff)

It looks like the point of subjective equality slightly biased when means are present and uncertainty is low. However, when uncertainty is high, users are less biased without means and tend to lean away from intervention when means are present. This is ignoring the impact of other manipulations.

Visualization Conditions

We also want to know whether different visualizations conditions bias the point of subjective equality or change sensitivity to utility on average. Just like we did for the effect of extrinsic means, we model these effects as a fixed intercept and a fixed slope (i.e., an interaction with the level of evidence). This is the first model on our decision data that includes all the predictors which are required to answer our research questions (i.e., the minimal model on our path of model expansion).

We’ll use the same priors as we did for our previous model. Now, let’s fit our model.

# hierarchical linear model with logit link and predictors per visualization condition
m.m.logistic <- brm(data = model_df, family = bernoulli(link = "logit"),
                    formula = bf(intervene ~ (1 + evidence|worker_id) + evidence*means*sd_diff*condition),
                    prior = c(prior(normal(0, 1), class = Intercept),
                              prior(normal(1, 1), class = b, coef = evidence),
                              prior(normal(0, 0.5), class = b),
                              prior(normal(0, 0.5), class = sd),
                              prior(lkj(4), class = cor)),
                    iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
                    file = "model-fits/logistic_mdl-minimal")

Check diagnostics:

  • Trace plots
# trace plots
plot(m.m.logistic)

  • Summary
# model summary
print(m.m.logistic)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence | worker_id) + evidence * means * sd_diff * condition 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
##          total post-warmup samples = 3000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)               1.41      0.05     1.31     1.52 1.00     1611
## sd(evidence)                1.02      0.05     0.93     1.13 1.00     1432
## cor(Intercept,evidence)     0.50      0.04     0.41     0.58 1.00     1472
##                         Tail_ESS
## sd(Intercept)               1997
## sd(evidence)                2210
## cor(Intercept,evidence)     2162
## 
## Population-Level Effects: 
##                                                 Estimate Est.Error l-95% CI
## Intercept                                           0.04      0.13    -0.21
## evidence                                            1.53      0.10     1.34
## meansTRUE                                          -0.14      0.09    -0.32
## sd_diff15                                           0.97      0.10     0.79
## conditionHOPs                                      -0.35      0.17    -0.71
## conditionintervals                                 -0.37      0.17    -0.70
## conditionQDPs                                       0.32      0.18    -0.03
## evidence:meansTRUE                                 -0.09      0.08    -0.26
## evidence:sd_diff15                                  0.37      0.09     0.19
## meansTRUE:sd_diff15                                 0.29      0.13     0.04
## evidence:conditionHOPs                             -0.27      0.14    -0.54
## evidence:conditionintervals                        -0.14      0.14    -0.42
## evidence:conditionQDPs                              0.21      0.15    -0.08
## meansTRUE:conditionHOPs                             0.10      0.13    -0.17
## meansTRUE:conditionintervals                       -0.01      0.14    -0.28
## meansTRUE:conditionQDPs                            -0.22      0.14    -0.49
## sd_diff15:conditionHOPs                             0.53      0.14     0.25
## sd_diff15:conditionintervals                        0.25      0.14    -0.02
## sd_diff15:conditionQDPs                            -0.02      0.14    -0.30
## evidence:meansTRUE:sd_diff15                       -0.07      0.12    -0.31
## evidence:meansTRUE:conditionHOPs                   -0.03      0.12    -0.25
## evidence:meansTRUE:conditionintervals               0.14      0.12    -0.09
## evidence:meansTRUE:conditionQDPs                    0.09      0.13    -0.15
## evidence:sd_diff15:conditionHOPs                    0.22      0.14    -0.04
## evidence:sd_diff15:conditionintervals               0.30      0.13     0.04
## evidence:sd_diff15:conditionQDPs                    0.16      0.14    -0.12
## meansTRUE:sd_diff15:conditionHOPs                  -0.41      0.19    -0.77
## meansTRUE:sd_diff15:conditionintervals              0.27      0.19    -0.12
## meansTRUE:sd_diff15:conditionQDPs                   0.12      0.19    -0.24
## evidence:meansTRUE:sd_diff15:conditionHOPs         -0.35      0.17    -0.69
## evidence:meansTRUE:sd_diff15:conditionintervals     0.14      0.18    -0.24
## evidence:meansTRUE:sd_diff15:conditionQDPs         -0.04      0.19    -0.41
##                                                 u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                                           0.28 1.00      903     1762
## evidence                                            1.74 1.00     1215     2302
## meansTRUE                                           0.05 1.00     1728     2292
## sd_diff15                                           1.16 1.00     1656     2408
## conditionHOPs                                      -0.01 1.00     1007     1779
## conditionintervals                                 -0.04 1.00     1133     2003
## conditionQDPs                                       0.65 1.00     1047     1667
## evidence:meansTRUE                                  0.07 1.00     1852     2621
## evidence:sd_diff15                                  0.55 1.00     1788     2453
## meansTRUE:sd_diff15                                 0.54 1.00     1694     2310
## evidence:conditionHOPs                              0.01 1.00     1340     2242
## evidence:conditionintervals                         0.13 1.00     1405     1827
## evidence:conditionQDPs                              0.50 1.00     1395     2042
## meansTRUE:conditionHOPs                             0.35 1.00     1921     2510
## meansTRUE:conditionintervals                        0.25 1.00     2037     2650
## meansTRUE:conditionQDPs                             0.05 1.00     1857     2395
## sd_diff15:conditionHOPs                             0.80 1.00     1874     2218
## sd_diff15:conditionintervals                        0.52 1.00     1894     2481
## sd_diff15:conditionQDPs                             0.26 1.00     2120     2320
## evidence:meansTRUE:sd_diff15                        0.18 1.00     1813     2480
## evidence:meansTRUE:conditionHOPs                    0.21 1.00     2150     2281
## evidence:meansTRUE:conditionintervals               0.38 1.00     2093     2348
## evidence:meansTRUE:conditionQDPs                    0.33 1.00     1692     2530
## evidence:sd_diff15:conditionHOPs                    0.49 1.00     2017     2422
## evidence:sd_diff15:conditionintervals               0.57 1.00     2172     2411
## evidence:sd_diff15:conditionQDPs                    0.44 1.00     2141     2748
## meansTRUE:sd_diff15:conditionHOPs                  -0.05 1.00     2000     2288
## meansTRUE:sd_diff15:conditionintervals              0.64 1.00     2087     2481
## meansTRUE:sd_diff15:conditionQDPs                   0.49 1.00     1919     2149
## evidence:meansTRUE:sd_diff15:conditionHOPs         -0.01 1.00     2089     2644
## evidence:meansTRUE:sd_diff15:conditionintervals     0.51 1.00     2097     2655
## evidence:meansTRUE:sd_diff15:conditionQDPs          0.32 1.00     2024     2427
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s take a look at the estimated psychometric functions for each worker.

model_check_df %>%
  group_by(evidence, worker_id, means, sd_diff, condition) %>%
  add_fitted_draws(m.m.logistic, value = "pf", n = 500) %>%
  ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
  geom_point(alpha = .15) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) +
  facet_wrap(. ~ worker_id)

What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like? We’ll start by getting the slopes and intercepts of the linear model and using these to derive estimates of the JND and PSE for each fitted draw.

# get slopes from linear model
slopes_df <- model_df %>%
  group_by(means, sd_diff, condition) %>%
  data_grid(evidence = c(0, 1)) %>%
  add_fitted_draws(m.m.logistic, re_formula = NA, scale = "linear") %>%
  compare_levels(.value, by = evidence) %>%
  rename(slope = .value)

# get intercepts from linear model
intercepts_df <- model_df %>%
  group_by(means, sd_diff, condition) %>%
  data_grid(evidence = 0) %>%
  add_fitted_draws(m.m.logistic, re_formula = NA, scale = "linear") %>%
  rename(intercept = .value) 

# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>% 
  full_join(intercepts_df, by = c("means", "sd_diff", "condition", ".draw")) %>%
  mutate(
    pse = -intercept / slope,
    jnd = qlogis(0.75) / slope
  )

First, let’s look at the estimates of JNDs per visualization, marginalizing across other manipulations.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = weighted.mean(jnd)) %>%
  ggplot(aes(x = jnd, group = condition, color = condition, fill = condition)) +
  geom_density(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) + 
  scale_color_brewer(type = "qual", palette = 2) +
  scale_x_continuous(expression(jnd), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior JND per visualization") +
  theme(panel.grid = element_blank())

It looks like users most sensitive to evidence (i.e., JNDs are smaller) with quantile dotplots and least sensitive with HOPs.

What if we look at the full interaction of extrinsic means, level of uncertainty, and visualization condition on JNDs?

stats_df %>%
  ggplot(aes(x = jnd, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(jnd), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior JND per condition") +
  theme(panel.grid = element_blank()) +
  facet_grid(condition ~ sd_diff)

It seems like means improve sensitivity to evidence a little bit for intervals but reduce sensitivity for every other visualization condition, especially HOPs when uncertainty is high.

Next, we’ll look at the point of subjective equality for each visualization condition, marginalizing out other manipulations.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = weighted.mean(pse)) %>%
  ggplot(aes(x = pse, group = condition, color = condition, fill = condition)) +
  geom_density(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) + 
  scale_color_brewer(type = "qual", palette = 2) +
  scale_x_continuous(expression(pse), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior PSE per visualization") +
  theme(panel.grid = element_blank())

It looks like users are biased towars intervening in all conditions but least so with intervals.

What if we look at the full interaction of extrinsic means, level of uncertainty, and visualization condition on PSE?

stats_df %>%
  ggplot(aes(x = pse, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(pse), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior PSE per condition") +
  theme(panel.grid = element_blank()) +
  facet_grid(condition ~ sd_diff)

In all conditions, adding means biases people toward intervening when uncertainty is low and away from intervening when uncertainty is high. This only seems to result in more optimal decisions for densities and quantile dotplots when uncertainty is low.

Building Up Random Effects for Within-Subjects Manipulations

In the minimal model to answer our research questions above, estimates for the effect of means are noisier than we would like. We’ll try to better account for heterogeneity across subjects by adding more random effects to our model for each within subjects manipulation.

Following a principle of model expansion, we will make this changes cumulatively. We include a series of model specifications that capture plausible structure in the data and fit without any sampling issues.

Random Effects for the Interaction of Means and Uncertainty Shown

This model adds random effects for the presence/absence of the mean and the level of uncertainty. This tells our model that each individual can respond differently to these factors, which may help to parse heterogeneity in responses. Note that we are unable to identify the random effect of evidence, means, and level of uncertainty because we have only one observation per unique combintation of these variables per worker. This is why the random effects formula does not match the fixed effects formula.

m.m.logistic.r_means.sd <- brm(
  data = model_df, family = bernoulli(link = "logit"),
  formula = bf(intervene ~ (1 + evidence + means*sd_diff|worker_id) + evidence*means*sd_diff*condition),
  prior = c(prior(normal(0, 1), class = Intercept),
            prior(normal(1, 1), class = b, coef = evidence),
            prior(normal(0, 0.5), class = b),
            prior(normal(0, 0.5), class = sd),
            prior(lkj(4), class = cor)),
  iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
  file = "model-fits/logistic_mdl-min-r_means_sd")
# trace plots
plot(m.m.logistic.r_means.sd)

summary(m.m.logistic.r_means.sd)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence + means * sd_diff | worker_id) + evidence * means * sd_diff * condition 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
##          total post-warmup samples = 3000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                          1.93      0.08     1.78     2.09 1.00
## sd(evidence)                           1.21      0.06     1.10     1.32 1.00
## sd(meansTRUE)                          1.61      0.09     1.44     1.80 1.00
## sd(sd_diff15)                          1.11      0.08     0.95     1.28 1.00
## sd(meansTRUE:sd_diff15)                0.66      0.14     0.39     0.93 1.00
## cor(Intercept,evidence)                0.31      0.05     0.21     0.41 1.00
## cor(Intercept,meansTRUE)              -0.32      0.06    -0.44    -0.20 1.00
## cor(evidence,meansTRUE)                0.06      0.06    -0.07     0.18 1.00
## cor(Intercept,sd_diff15)              -0.50      0.07    -0.62    -0.37 1.00
## cor(evidence,sd_diff15)                0.17      0.08     0.02     0.32 1.00
## cor(meansTRUE,sd_diff15)               0.29      0.08     0.13     0.44 1.00
## cor(Intercept,meansTRUE:sd_diff15)     0.04      0.16    -0.27     0.36 1.00
## cor(evidence,meansTRUE:sd_diff15)      0.34      0.14     0.07     0.62 1.00
## cor(meansTRUE,meansTRUE:sd_diff15)    -0.55      0.12    -0.77    -0.29 1.00
## cor(sd_diff15,meansTRUE:sd_diff15)    -0.15      0.17    -0.45     0.20 1.00
##                                    Bulk_ESS Tail_ESS
## sd(Intercept)                          2051     2478
## sd(evidence)                           1736     2634
## sd(meansTRUE)                          2124     2525
## sd(sd_diff15)                          2011     2538
## sd(meansTRUE:sd_diff15)                1708     2448
## cor(Intercept,evidence)                1757     2473
## cor(Intercept,meansTRUE)               2383     2826
## cor(evidence,meansTRUE)                2341     2518
## cor(Intercept,sd_diff15)               2654     2732
## cor(evidence,sd_diff15)                2427     2512
## cor(meansTRUE,sd_diff15)               2196     2341
## cor(Intercept,meansTRUE:sd_diff15)     2545     2541
## cor(evidence,meansTRUE:sd_diff15)      2126     2469
## cor(meansTRUE,meansTRUE:sd_diff15)     2756     2300
## cor(sd_diff15,meansTRUE:sd_diff15)     2225     2706
## 
## Population-Level Effects: 
##                                                 Estimate Est.Error l-95% CI
## Intercept                                          -0.01      0.15    -0.31
## evidence                                            1.77      0.12     1.54
## meansTRUE                                          -0.18      0.14    -0.46
## sd_diff15                                           1.14      0.12     0.90
## conditionHOPs                                      -0.42      0.21    -0.81
## conditionintervals                                 -0.42      0.21    -0.84
## conditionQDPs                                       0.31      0.21    -0.10
## evidence:meansTRUE                                 -0.07      0.10    -0.26
## evidence:sd_diff15                                  0.36      0.11     0.16
## meansTRUE:sd_diff15                                 0.43      0.15     0.15
## evidence:conditionHOPs                             -0.29      0.16    -0.60
## evidence:conditionintervals                        -0.16      0.16    -0.47
## evidence:conditionQDPs                              0.22      0.17    -0.11
## meansTRUE:conditionHOPs                             0.13      0.20    -0.27
## meansTRUE:conditionintervals                       -0.01      0.20    -0.42
## meansTRUE:conditionQDPs                            -0.19      0.20    -0.58
## sd_diff15:conditionHOPs                             0.59      0.18     0.23
## sd_diff15:conditionintervals                        0.28      0.17    -0.05
## sd_diff15:conditionQDPs                             0.01      0.18    -0.33
## evidence:meansTRUE:sd_diff15                       -0.01      0.14    -0.27
## evidence:meansTRUE:conditionHOPs                   -0.02      0.13    -0.26
## evidence:meansTRUE:conditionintervals               0.17      0.13    -0.08
## evidence:meansTRUE:conditionQDPs                    0.16      0.14    -0.13
## evidence:sd_diff15:conditionHOPs                    0.20      0.14    -0.08
## evidence:sd_diff15:conditionintervals               0.30      0.15     0.01
## evidence:sd_diff15:conditionQDPs                    0.19      0.15    -0.12
## meansTRUE:sd_diff15:conditionHOPs                  -0.44      0.21    -0.83
## meansTRUE:sd_diff15:conditionintervals              0.31      0.21    -0.09
## meansTRUE:sd_diff15:conditionQDPs                   0.17      0.21    -0.24
## evidence:meansTRUE:sd_diff15:conditionHOPs         -0.33      0.18    -0.69
## evidence:meansTRUE:sd_diff15:conditionintervals     0.16      0.20    -0.22
## evidence:meansTRUE:sd_diff15:conditionQDPs         -0.04      0.20    -0.42
##                                                 u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                                           0.27 1.00     2723     2789
## evidence                                            2.00 1.00     2379     2785
## meansTRUE                                           0.09 1.00     2905     2613
## sd_diff15                                           1.38 1.00     2862     2667
## conditionHOPs                                       0.00 1.00     2743     2515
## conditionintervals                                 -0.02 1.00     2385     2824
## conditionQDPs                                       0.73 1.00     2775     2730
## evidence:meansTRUE                                  0.11 1.00     2848     2906
## evidence:sd_diff15                                  0.58 1.00     2435     2829
## meansTRUE:sd_diff15                                 0.71 1.00     2475     2253
## evidence:conditionHOPs                              0.03 1.00     2391     2477
## evidence:conditionintervals                         0.17 1.00     2197     2718
## evidence:conditionQDPs                              0.55 1.00     2509     2941
## meansTRUE:conditionHOPs                             0.51 1.00     2729     2945
## meansTRUE:conditionintervals                        0.38 1.00     2914     2450
## meansTRUE:conditionQDPs                             0.22 1.00     2786     2417
## sd_diff15:conditionHOPs                             0.93 1.00     2634     2746
## sd_diff15:conditionintervals                        0.62 1.00     2661     2714
## sd_diff15:conditionQDPs                             0.36 1.00     2878     2866
## evidence:meansTRUE:sd_diff15                        0.26 1.00     2705     2539
## evidence:meansTRUE:conditionHOPs                    0.23 1.00     2286     2519
## evidence:meansTRUE:conditionintervals               0.43 1.00     2517     2637
## evidence:meansTRUE:conditionQDPs                    0.43 1.00     2750     2823
## evidence:sd_diff15:conditionHOPs                    0.48 1.00     2180     2533
## evidence:sd_diff15:conditionintervals               0.59 1.00     2014     2338
## evidence:sd_diff15:conditionQDPs                    0.49 1.00     2213     2650
## meansTRUE:sd_diff15:conditionHOPs                  -0.03 1.00     2395     2726
## meansTRUE:sd_diff15:conditionintervals              0.72 1.00     2480     2344
## meansTRUE:sd_diff15:conditionQDPs                   0.57 1.00     2559     2362
## evidence:meansTRUE:sd_diff15:conditionHOPs          0.04 1.00     2439     2471
## evidence:meansTRUE:sd_diff15:conditionintervals     0.55 1.00     2313     2520
## evidence:meansTRUE:sd_diff15:conditionQDPs          0.34 1.00     2341     2671
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Mixed Effects of Trial Order on JND and PSE

This model adds mixed effects of trial order on JNDs and PSEs. This functionally adds a learning component to our model of the latent sense of utility on which decisions are based.

# hierarchical logistic regression
m.m.logistic.r_means.sd.trial <- brm(
  data = model_df, family = bernoulli(link = "logit"),
  formula = bf(intervene ~ (1 + evidence*means*sd_diff + evidence*trial|worker_id) + evidence*means*sd_diff*condition + evidence*condition*trial),
  prior = c(prior(normal(0, 1), class = Intercept),
            prior(normal(1, 1), class = b, coef = evidence),
            prior(normal(0, 0.5), class = b),
            prior(normal(0, 0.5), class = sd),
            prior(lkj(4), class = cor)),
  iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
  file = "model-fits/logistic_mdl-min-r_means_sd_trial")
# trace plots
plot(m.m.logistic.r_means.sd.trial)

summary(m.m.logistic.r_means.sd.trial)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence * means * sd_diff + evidence * trial | worker_id) + evidence * means * sd_diff * condition + evidence * condition * trial 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
##          total post-warmup samples = 3000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                                       Estimate Est.Error
## sd(Intercept)                                             1.90      0.09
## sd(evidence)                                              1.27      0.08
## sd(meansTRUE)                                             1.33      0.11
## sd(sd_diff15)                                             1.18      0.09
## sd(trial)                                                 2.47      0.16
## sd(evidence:meansTRUE)                                    0.79      0.11
## sd(evidence:sd_diff15)                                    0.74      0.10
## sd(meansTRUE:sd_diff15)                                   0.76      0.16
## sd(evidence:trial)                                        1.49      0.16
## sd(evidence:meansTRUE:sd_diff15)                          0.70      0.19
## cor(Intercept,evidence)                                   0.56      0.05
## cor(Intercept,meansTRUE)                                 -0.15      0.09
## cor(evidence,meansTRUE)                                   0.01      0.09
## cor(Intercept,sd_diff15)                                 -0.40      0.07
## cor(evidence,sd_diff15)                                  -0.07      0.09
## cor(meansTRUE,sd_diff15)                                  0.26      0.09
## cor(Intercept,trial)                                      0.36      0.07
## cor(evidence,trial)                                       0.13      0.08
## cor(meansTRUE,trial)                                      0.20      0.08
## cor(sd_diff15,trial)                                     -0.07      0.09
## cor(Intercept,evidence:meansTRUE)                        -0.21      0.11
## cor(evidence,evidence:meansTRUE)                         -0.10      0.13
## cor(meansTRUE,evidence:meansTRUE)                         0.45      0.12
## cor(sd_diff15,evidence:meansTRUE)                         0.32      0.12
## cor(trial,evidence:meansTRUE)                             0.14      0.12
## cor(Intercept,evidence:sd_diff15)                        -0.37      0.11
## cor(evidence,evidence:sd_diff15)                         -0.09      0.12
## cor(meansTRUE,evidence:sd_diff15)                        -0.07      0.13
## cor(sd_diff15,evidence:sd_diff15)                         0.65      0.09
## cor(trial,evidence:sd_diff15)                            -0.13      0.12
## cor(evidence:meansTRUE,evidence:sd_diff15)                0.20      0.15
## cor(Intercept,meansTRUE:sd_diff15)                       -0.06      0.15
## cor(evidence,meansTRUE:sd_diff15)                         0.32      0.14
## cor(meansTRUE,meansTRUE:sd_diff15)                       -0.15      0.16
## cor(sd_diff15,meansTRUE:sd_diff15)                        0.06      0.16
## cor(trial,meansTRUE:sd_diff15)                           -0.03      0.15
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)               0.17      0.18
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)               0.22      0.17
## cor(Intercept,evidence:trial)                             0.28      0.09
## cor(evidence,evidence:trial)                              0.41      0.09
## cor(meansTRUE,evidence:trial)                             0.02      0.12
## cor(sd_diff15,evidence:trial)                            -0.20      0.11
## cor(trial,evidence:trial)                                 0.50      0.09
## cor(evidence:meansTRUE,evidence:trial)                    0.19      0.13
## cor(evidence:sd_diff15,evidence:trial)                   -0.13      0.13
## cor(meansTRUE:sd_diff15,evidence:trial)                   0.16      0.16
## cor(Intercept,evidence:meansTRUE:sd_diff15)              -0.23      0.14
## cor(evidence,evidence:meansTRUE:sd_diff15)                0.08      0.16
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)               0.05      0.17
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)               0.09      0.17
## cor(trial,evidence:meansTRUE:sd_diff15)                  -0.04      0.16
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)      0.02      0.19
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)      0.09      0.19
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     0.50      0.17
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)         -0.05      0.18
##                                                       l-95% CI u-95% CI Rhat
## sd(Intercept)                                             1.73     2.08 1.00
## sd(evidence)                                              1.12     1.42 1.00
## sd(meansTRUE)                                             1.10     1.55 1.01
## sd(sd_diff15)                                             1.01     1.36 1.00
## sd(trial)                                                 2.17     2.79 1.00
## sd(evidence:meansTRUE)                                    0.56     1.01 1.00
## sd(evidence:sd_diff15)                                    0.56     0.95 1.00
## sd(meansTRUE:sd_diff15)                                   0.44     1.07 1.00
## sd(evidence:trial)                                        1.18     1.80 1.00
## sd(evidence:meansTRUE:sd_diff15)                          0.32     1.04 1.00
## cor(Intercept,evidence)                                   0.46     0.66 1.00
## cor(Intercept,meansTRUE)                                 -0.31     0.02 1.00
## cor(evidence,meansTRUE)                                  -0.16     0.18 1.00
## cor(Intercept,sd_diff15)                                 -0.54    -0.25 1.00
## cor(evidence,sd_diff15)                                  -0.24     0.10 1.00
## cor(meansTRUE,sd_diff15)                                  0.07     0.44 1.00
## cor(Intercept,trial)                                      0.23     0.49 1.00
## cor(evidence,trial)                                      -0.01     0.28 1.00
## cor(meansTRUE,trial)                                      0.03     0.35 1.00
## cor(sd_diff15,trial)                                     -0.23     0.12 1.00
## cor(Intercept,evidence:meansTRUE)                        -0.43    -0.00 1.00
## cor(evidence,evidence:meansTRUE)                         -0.34     0.16 1.00
## cor(meansTRUE,evidence:meansTRUE)                         0.21     0.65 1.00
## cor(sd_diff15,evidence:meansTRUE)                         0.09     0.55 1.00
## cor(trial,evidence:meansTRUE)                            -0.10     0.36 1.00
## cor(Intercept,evidence:sd_diff15)                        -0.57    -0.16 1.00
## cor(evidence,evidence:sd_diff15)                         -0.32     0.15 1.00
## cor(meansTRUE,evidence:sd_diff15)                        -0.32     0.19 1.00
## cor(sd_diff15,evidence:sd_diff15)                         0.46     0.82 1.00
## cor(trial,evidence:sd_diff15)                            -0.36     0.10 1.00
## cor(evidence:meansTRUE,evidence:sd_diff15)               -0.09     0.48 1.00
## cor(Intercept,meansTRUE:sd_diff15)                       -0.36     0.23 1.00
## cor(evidence,meansTRUE:sd_diff15)                         0.04     0.58 1.00
## cor(meansTRUE,meansTRUE:sd_diff15)                       -0.45     0.18 1.00
## cor(sd_diff15,meansTRUE:sd_diff15)                       -0.25     0.39 1.00
## cor(trial,meansTRUE:sd_diff15)                           -0.32     0.26 1.00
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)              -0.19     0.50 1.00
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)              -0.12     0.55 1.00
## cor(Intercept,evidence:trial)                             0.10     0.46 1.00
## cor(evidence,evidence:trial)                              0.22     0.59 1.00
## cor(meansTRUE,evidence:trial)                            -0.21     0.25 1.00
## cor(sd_diff15,evidence:trial)                            -0.41     0.00 1.00
## cor(trial,evidence:trial)                                 0.32     0.67 1.00
## cor(evidence:meansTRUE,evidence:trial)                   -0.06     0.44 1.00
## cor(evidence:sd_diff15,evidence:trial)                   -0.37     0.13 1.00
## cor(meansTRUE:sd_diff15,evidence:trial)                  -0.16     0.47 1.00
## cor(Intercept,evidence:meansTRUE:sd_diff15)              -0.49     0.07 1.00
## cor(evidence,evidence:meansTRUE:sd_diff15)               -0.23     0.38 1.00
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)              -0.29     0.38 1.00
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)              -0.24     0.43 1.00
## cor(trial,evidence:meansTRUE:sd_diff15)                  -0.35     0.26 1.00
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)     -0.35     0.41 1.00
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)     -0.28     0.47 1.00
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     0.12     0.77 1.00
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)         -0.40     0.30 1.00
##                                                       Bulk_ESS Tail_ESS
## sd(Intercept)                                             1953     2802
## sd(evidence)                                              2181     2683
## sd(meansTRUE)                                             1106     1899
## sd(sd_diff15)                                             2039     2532
## sd(trial)                                                 1645     2510
## sd(evidence:meansTRUE)                                     871     1786
## sd(evidence:sd_diff15)                                    1572     2440
## sd(meansTRUE:sd_diff15)                                    927     1712
## sd(evidence:trial)                                        1609     1883
## sd(evidence:meansTRUE:sd_diff15)                           706      883
## cor(Intercept,evidence)                                   2036     2261
## cor(Intercept,meansTRUE)                                  2014     2139
## cor(evidence,meansTRUE)                                   1747     2203
## cor(Intercept,sd_diff15)                                  2189     2828
## cor(evidence,sd_diff15)                                   1893     2647
## cor(meansTRUE,sd_diff15)                                  1358     2160
## cor(Intercept,trial)                                      2009     2371
## cor(evidence,trial)                                       1558     2312
## cor(meansTRUE,trial)                                      1614     2369
## cor(sd_diff15,trial)                                      1433     1868
## cor(Intercept,evidence:meansTRUE)                         2016     2450
## cor(evidence,evidence:meansTRUE)                          1955     2121
## cor(meansTRUE,evidence:meansTRUE)                         1271     1767
## cor(sd_diff15,evidence:meansTRUE)                         1470     2139
## cor(trial,evidence:meansTRUE)                             1352     2176
## cor(Intercept,evidence:sd_diff15)                         2197     2269
## cor(evidence,evidence:sd_diff15)                          2060     2465
## cor(meansTRUE,evidence:sd_diff15)                         1646     2051
## cor(sd_diff15,evidence:sd_diff15)                         1748     2598
## cor(trial,evidence:sd_diff15)                             1920     2319
## cor(evidence:meansTRUE,evidence:sd_diff15)                1861     2017
## cor(Intercept,meansTRUE:sd_diff15)                        2086     2464
## cor(evidence,meansTRUE:sd_diff15)                         1892     2626
## cor(meansTRUE,meansTRUE:sd_diff15)                        1857     2041
## cor(sd_diff15,meansTRUE:sd_diff15)                        1740     2736
## cor(trial,meansTRUE:sd_diff15)                            1957     2502
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)               1206     1549
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)               1439     1933
## cor(Intercept,evidence:trial)                             2211     2331
## cor(evidence,evidence:trial)                              1942     2464
## cor(meansTRUE,evidence:trial)                             1392     2021
## cor(sd_diff15,evidence:trial)                             1457     2420
## cor(trial,evidence:trial)                                 1564     2245
## cor(evidence:meansTRUE,evidence:trial)                    1606     2290
## cor(evidence:sd_diff15,evidence:trial)                    1512     2147
## cor(meansTRUE:sd_diff15,evidence:trial)                   1243     1936
## cor(Intercept,evidence:meansTRUE:sd_diff15)               1924     2517
## cor(evidence,evidence:meansTRUE:sd_diff15)                2053     2015
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)               2213     2662
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)               2065     2448
## cor(trial,evidence:meansTRUE:sd_diff15)                   2134     1895
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)      1602     2522
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)      1665     2435
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     1421     1904
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)          2113     2823
## 
## Population-Level Effects: 
##                                                 Estimate Est.Error l-95% CI
## Intercept                                           0.08      0.15    -0.21
## evidence                                            1.92      0.13     1.67
## meansTRUE                                          -0.19      0.15    -0.48
## sd_diff15                                           1.27      0.14     1.00
## conditionHOPs                                      -0.38      0.21    -0.78
## conditionintervals                                 -0.43      0.21    -0.84
## conditionQDPs                                       0.41      0.22    -0.01
## trial                                               1.07      0.21     0.66
## evidence:meansTRUE                                 -0.01      0.14    -0.29
## evidence:sd_diff15                                  0.67      0.14     0.39
## meansTRUE:sd_diff15                                 0.66      0.17     0.35
## evidence:conditionHOPs                             -0.33      0.17    -0.65
## evidence:conditionintervals                        -0.14      0.17    -0.48
## evidence:conditionQDPs                              0.29      0.17    -0.05
## meansTRUE:conditionHOPs                             0.07      0.21    -0.34
## meansTRUE:conditionintervals                        0.01      0.21    -0.41
## meansTRUE:conditionQDPs                            -0.31      0.21    -0.72
## sd_diff15:conditionHOPs                             0.54      0.19     0.18
## sd_diff15:conditionintervals                        0.24      0.19    -0.12
## sd_diff15:conditionQDPs                             0.01      0.20    -0.37
## evidence:trial                                      1.39      0.20     1.01
## conditionHOPs:trial                                -0.09      0.29    -0.65
## conditionintervals:trial                            0.61      0.29     0.06
## conditionQDPs:trial                                 0.34      0.30    -0.25
## evidence:meansTRUE:sd_diff15                        0.25      0.19    -0.12
## evidence:meansTRUE:conditionHOPs                   -0.01      0.17    -0.36
## evidence:meansTRUE:conditionintervals               0.21      0.18    -0.16
## evidence:meansTRUE:conditionQDPs                    0.12      0.19    -0.26
## evidence:sd_diff15:conditionHOPs                    0.17      0.18    -0.16
## evidence:sd_diff15:conditionintervals               0.22      0.18    -0.14
## evidence:sd_diff15:conditionQDPs                    0.23      0.19    -0.15
## meansTRUE:sd_diff15:conditionHOPs                  -0.39      0.22    -0.82
## meansTRUE:sd_diff15:conditionintervals              0.41      0.23    -0.01
## meansTRUE:sd_diff15:conditionQDPs                   0.17      0.23    -0.29
## evidence:conditionHOPs:trial                       -0.72      0.25    -1.21
## evidence:conditionintervals:trial                   0.62      0.25     0.12
## evidence:conditionQDPs:trial                        0.30      0.27    -0.24
## evidence:meansTRUE:sd_diff15:conditionHOPs         -0.36      0.22    -0.79
## evidence:meansTRUE:sd_diff15:conditionintervals     0.34      0.24    -0.12
## evidence:meansTRUE:sd_diff15:conditionQDPs         -0.06      0.24    -0.54
##                                                 u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                                           0.38 1.00     1994     2394
## evidence                                            2.18 1.00     2170     2574
## meansTRUE                                           0.09 1.00     2230     2473
## sd_diff15                                           1.54 1.00     2575     2866
## conditionHOPs                                       0.04 1.00     2054     2223
## conditionintervals                                 -0.02 1.00     2339     2715
## conditionQDPs                                       0.82 1.00     2083     2495
## trial                                               1.46 1.00     2573     2647
## evidence:meansTRUE                                  0.27 1.00     2476     2723
## evidence:sd_diff15                                  0.95 1.00     2048     2108
## meansTRUE:sd_diff15                                 1.00 1.00     2710     2791
## evidence:conditionHOPs                              0.01 1.00     2088     2580
## evidence:conditionintervals                         0.18 1.00     2193     2310
## evidence:conditionQDPs                              0.63 1.00     2422     2680
## meansTRUE:conditionHOPs                             0.49 1.00     2854     2674
## meansTRUE:conditionintervals                        0.42 1.00     2613     2653
## meansTRUE:conditionQDPs                             0.10 1.00     2461     2370
## sd_diff15:conditionHOPs                             0.92 1.00     2433     2721
## sd_diff15:conditionintervals                        0.62 1.00     2480     2768
## sd_diff15:conditionQDPs                             0.40 1.00     2569     2601
## evidence:trial                                      1.78 1.00     2519     2663
## conditionHOPs:trial                                 0.49 1.00     2914     2827
## conditionintervals:trial                            1.18 1.00     2832     2829
## conditionQDPs:trial                                 0.94 1.00     2734     2693
## evidence:meansTRUE:sd_diff15                        0.62 1.00     2118     2497
## evidence:meansTRUE:conditionHOPs                    0.33 1.00     2499     2599
## evidence:meansTRUE:conditionintervals               0.58 1.00     2297     2694
## evidence:meansTRUE:conditionQDPs                    0.49 1.00     2684     2343
## evidence:sd_diff15:conditionHOPs                    0.52 1.00     2297     2331
## evidence:sd_diff15:conditionintervals               0.57 1.00     2261     2372
## evidence:sd_diff15:conditionQDPs                    0.59 1.00     2593     2505
## meansTRUE:sd_diff15:conditionHOPs                   0.04 1.00     2320     2370
## meansTRUE:sd_diff15:conditionintervals              0.87 1.00     2420     2506
## meansTRUE:sd_diff15:conditionQDPs                   0.61 1.00     2644     2680
## evidence:conditionHOPs:trial                       -0.23 1.00     2304     2530
## evidence:conditionintervals:trial                   1.13 1.00     2716     2497
## evidence:conditionQDPs:trial                        0.82 1.00     2818     2714
## evidence:meansTRUE:sd_diff15:conditionHOPs          0.05 1.00     2391     2266
## evidence:meansTRUE:sd_diff15:conditionintervals     0.82 1.00     2206     2597
## evidence:meansTRUE:sd_diff15:conditionQDPs          0.41 1.00     2586     2684
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Model Comparison

Each time we add a random effect, the number of parameters multiplies. We want to make sure these parameters are contributing to the predictive validity of the model more than they risk overfitting. We’ll evaluate this by using WAIC to compare models. Whichever model has the smallest value of WAIC is the one that has the best predictive validity for the fewest parameters.

waic(m.m.logistic, m.m.logistic.r_means.sd, m.m.logistic.r_means.sd.trial)
## Output of model 'm.m.logistic':
## 
## Computed from 3000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -8509.1  80.6
## p_waic       980.0  19.1
## waic       17018.2 161.2
## 
## 329 (1.7%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.m.logistic.r_means.sd':
## 
## Computed from 3000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -7906.6  81.2
## p_waic      1573.5  25.7
## waic       15813.3 162.4
## 
## 666 (3.3%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.m.logistic.r_means.sd.trial':
## 
## Computed from 3000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -7398.5  77.2
## p_waic      1843.9  30.2
## waic       14797.0 154.4
## 
## 961 (4.8%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Model comparisons:
##                               elpd_diff se_diff
## m.m.logistic.r_means.sd.trial     0.0       0.0
## m.m.logistic.r_means.sd        -508.1      31.9
## m.m.logistic                  -1110.6      45.5

It looks like the model with random effects predictors for means, level of uncertainty, and trial has the lowest value of WAIC, so we’ll move forward with that model.

Add Predictor for Block Order

Does it matter whether participants start the study using means? Here we add fixed effects to our model to look into this possibility.

Again, we’ll use the same priors as we did for our previous models. Now, let’s fit our model.

m.m_order.logistic.r_means.sd.trial <- brm(
  data = model_df, family = bernoulli(link = "logit"),
  formula = bf(intervene ~ (1 + evidence*means*sd_diff + evidence*trial|worker_id) + evidence*means*sd_diff*condition*start_means + evidence*condition*trial),
  prior = c(prior(normal(0, 1), class = Intercept),
            prior(normal(1, 1), class = b, coef = evidence),
            prior(normal(0, 0.5), class = b),
            prior(normal(0, 0.5), class = sd),
            prior(lkj(4), class = cor)),
  iter = 5000, warmup = 2000, chains = 2, cores = 2, thin = 2,
  file = "model-fits/logistic_mdl-min_order-r_means_sd_trial")
# trace plots
plot(m.m_order.logistic.r_means.sd.trial)

summary(m.m_order.logistic.r_means.sd.trial)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence * means * sd_diff + evidence * trial | worker_id) + evidence * means * sd_diff * condition * start_means + evidence * condition * trial 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 5000; warmup = 2000; thin = 2;
##          total post-warmup samples = 3000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                                       Estimate Est.Error
## sd(Intercept)                                             1.86      0.09
## sd(evidence)                                              1.24      0.08
## sd(meansTRUE)                                             1.32      0.12
## sd(sd_diff15)                                             1.15      0.09
## sd(trial)                                                 2.50      0.16
## sd(evidence:meansTRUE)                                    0.76      0.11
## sd(evidence:sd_diff15)                                    0.74      0.10
## sd(meansTRUE:sd_diff15)                                   0.71      0.16
## sd(evidence:trial)                                        1.51      0.15
## sd(evidence:meansTRUE:sd_diff15)                          0.64      0.19
## cor(Intercept,evidence)                                   0.54      0.05
## cor(Intercept,meansTRUE)                                 -0.11      0.09
## cor(evidence,meansTRUE)                                   0.05      0.09
## cor(Intercept,sd_diff15)                                 -0.37      0.08
## cor(evidence,sd_diff15)                                  -0.03      0.09
## cor(meansTRUE,sd_diff15)                                  0.23      0.10
## cor(Intercept,trial)                                      0.37      0.07
## cor(evidence,trial)                                       0.14      0.08
## cor(meansTRUE,trial)                                      0.22      0.08
## cor(sd_diff15,trial)                                     -0.05      0.09
## cor(Intercept,evidence:meansTRUE)                        -0.17      0.11
## cor(evidence,evidence:meansTRUE)                         -0.03      0.14
## cor(meansTRUE,evidence:meansTRUE)                         0.45      0.12
## cor(sd_diff15,evidence:meansTRUE)                         0.30      0.12
## cor(trial,evidence:meansTRUE)                             0.16      0.12
## cor(Intercept,evidence:sd_diff15)                        -0.36      0.11
## cor(evidence,evidence:sd_diff15)                         -0.06      0.12
## cor(meansTRUE,evidence:sd_diff15)                        -0.09      0.13
## cor(sd_diff15,evidence:sd_diff15)                         0.64      0.09
## cor(trial,evidence:sd_diff15)                            -0.10      0.11
## cor(evidence:meansTRUE,evidence:sd_diff15)                0.17      0.15
## cor(Intercept,meansTRUE:sd_diff15)                       -0.10      0.16
## cor(evidence,meansTRUE:sd_diff15)                         0.31      0.15
## cor(meansTRUE,meansTRUE:sd_diff15)                       -0.16      0.17
## cor(sd_diff15,meansTRUE:sd_diff15)                        0.06      0.17
## cor(trial,meansTRUE:sd_diff15)                           -0.08      0.16
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)               0.15      0.18
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)               0.25      0.18
## cor(Intercept,evidence:trial)                             0.26      0.09
## cor(evidence,evidence:trial)                              0.40      0.10
## cor(meansTRUE,evidence:trial)                             0.05      0.12
## cor(sd_diff15,evidence:trial)                            -0.17      0.10
## cor(trial,evidence:trial)                                 0.50      0.09
## cor(evidence:meansTRUE,evidence:trial)                    0.25      0.13
## cor(evidence:sd_diff15,evidence:trial)                   -0.11      0.13
## cor(meansTRUE:sd_diff15,evidence:trial)                   0.13      0.17
## cor(Intercept,evidence:meansTRUE:sd_diff15)              -0.21      0.15
## cor(evidence,evidence:meansTRUE:sd_diff15)                0.09      0.16
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)              -0.00      0.18
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)               0.07      0.18
## cor(trial,evidence:meansTRUE:sd_diff15)                  -0.08      0.17
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)     -0.06      0.20
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)      0.10      0.19
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     0.47      0.18
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)         -0.10      0.18
##                                                       l-95% CI u-95% CI Rhat
## sd(Intercept)                                             1.70     2.04 1.00
## sd(evidence)                                              1.10     1.41 1.00
## sd(meansTRUE)                                             1.08     1.55 1.00
## sd(sd_diff15)                                             0.98     1.33 1.00
## sd(trial)                                                 2.18     2.81 1.00
## sd(evidence:meansTRUE)                                    0.54     0.98 1.00
## sd(evidence:sd_diff15)                                    0.55     0.93 1.00
## sd(meansTRUE:sd_diff15)                                   0.40     1.01 1.00
## sd(evidence:trial)                                        1.21     1.80 1.00
## sd(evidence:meansTRUE:sd_diff15)                          0.23     0.99 1.00
## cor(Intercept,evidence)                                   0.43     0.64 1.00
## cor(Intercept,meansTRUE)                                 -0.28     0.08 1.00
## cor(evidence,meansTRUE)                                  -0.12     0.24 1.00
## cor(Intercept,sd_diff15)                                 -0.52    -0.21 1.00
## cor(evidence,sd_diff15)                                  -0.21     0.14 1.00
## cor(meansTRUE,sd_diff15)                                  0.04     0.42 1.01
## cor(Intercept,trial)                                      0.24     0.49 1.00
## cor(evidence,trial)                                      -0.01     0.28 1.00
## cor(meansTRUE,trial)                                      0.06     0.37 1.00
## cor(sd_diff15,trial)                                     -0.22     0.13 1.00
## cor(Intercept,evidence:meansTRUE)                        -0.38     0.06 1.00
## cor(evidence,evidence:meansTRUE)                         -0.29     0.24 1.00
## cor(meansTRUE,evidence:meansTRUE)                         0.19     0.68 1.00
## cor(sd_diff15,evidence:meansTRUE)                         0.06     0.53 1.00
## cor(trial,evidence:meansTRUE)                            -0.07     0.40 1.00
## cor(Intercept,evidence:sd_diff15)                        -0.56    -0.14 1.00
## cor(evidence,evidence:sd_diff15)                         -0.29     0.19 1.00
## cor(meansTRUE,evidence:sd_diff15)                        -0.34     0.17 1.00
## cor(sd_diff15,evidence:sd_diff15)                         0.45     0.81 1.00
## cor(trial,evidence:sd_diff15)                            -0.32     0.12 1.00
## cor(evidence:meansTRUE,evidence:sd_diff15)               -0.13     0.44 1.00
## cor(Intercept,meansTRUE:sd_diff15)                       -0.41     0.20 1.00
## cor(evidence,meansTRUE:sd_diff15)                         0.01     0.58 1.00
## cor(meansTRUE,meansTRUE:sd_diff15)                       -0.47     0.21 1.00
## cor(sd_diff15,meansTRUE:sd_diff15)                       -0.25     0.42 1.00
## cor(trial,meansTRUE:sd_diff15)                           -0.41     0.22 1.00
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)              -0.20     0.50 1.00
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)              -0.10     0.58 1.00
## cor(Intercept,evidence:trial)                             0.08     0.44 1.00
## cor(evidence,evidence:trial)                              0.21     0.59 1.00
## cor(meansTRUE,evidence:trial)                            -0.19     0.27 1.00
## cor(sd_diff15,evidence:trial)                            -0.38     0.03 1.00
## cor(trial,evidence:trial)                                 0.33     0.67 1.00
## cor(evidence:meansTRUE,evidence:trial)                   -0.01     0.49 1.00
## cor(evidence:sd_diff15,evidence:trial)                   -0.37     0.15 1.00
## cor(meansTRUE:sd_diff15,evidence:trial)                  -0.21     0.47 1.00
## cor(Intercept,evidence:meansTRUE:sd_diff15)              -0.50     0.09 1.00
## cor(evidence,evidence:meansTRUE:sd_diff15)               -0.23     0.40 1.00
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)              -0.34     0.34 1.00
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)              -0.29     0.43 1.00
## cor(trial,evidence:meansTRUE:sd_diff15)                  -0.41     0.25 1.00
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)     -0.41     0.35 1.00
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)     -0.27     0.49 1.00
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     0.05     0.76 1.00
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)         -0.44     0.25 1.00
##                                                       Bulk_ESS Tail_ESS
## sd(Intercept)                                             1658     2254
## sd(evidence)                                              1610     2060
## sd(meansTRUE)                                              725     1035
## sd(sd_diff15)                                             1259     2076
## sd(trial)                                                 1278     2079
## sd(evidence:meansTRUE)                                     602      842
## sd(evidence:sd_diff15)                                     949     1771
## sd(meansTRUE:sd_diff15)                                    728     1272
## sd(evidence:trial)                                        1154     1686
## sd(evidence:meansTRUE:sd_diff15)                           450      450
## cor(Intercept,evidence)                                   1204     2126
## cor(Intercept,meansTRUE)                                  1315     1689
## cor(evidence,meansTRUE)                                    904     1477
## cor(Intercept,sd_diff15)                                  1558     2200
## cor(evidence,sd_diff15)                                   1077     1699
## cor(meansTRUE,sd_diff15)                                   850     1468
## cor(Intercept,trial)                                      1319     1761
## cor(evidence,trial)                                       1074     2075
## cor(meansTRUE,trial)                                      1116     1544
## cor(sd_diff15,trial)                                       972     1537
## cor(Intercept,evidence:meansTRUE)                         1892     2554
## cor(evidence,evidence:meansTRUE)                          1016     1895
## cor(meansTRUE,evidence:meansTRUE)                          783     1485
## cor(sd_diff15,evidence:meansTRUE)                          888     1609
## cor(trial,evidence:meansTRUE)                              858     1516
## cor(Intercept,evidence:sd_diff15)                         1570     2225
## cor(evidence,evidence:sd_diff15)                          1205     1942
## cor(meansTRUE,evidence:sd_diff15)                         1148     2171
## cor(sd_diff15,evidence:sd_diff15)                         1147     2056
## cor(trial,evidence:sd_diff15)                             1413     2331
## cor(evidence:meansTRUE,evidence:sd_diff15)                1031     2027
## cor(Intercept,meansTRUE:sd_diff15)                        1922     2177
## cor(evidence,meansTRUE:sd_diff15)                         1643     2163
## cor(meansTRUE,meansTRUE:sd_diff15)                        1056     1585
## cor(sd_diff15,meansTRUE:sd_diff15)                        1033     1711
## cor(trial,meansTRUE:sd_diff15)                            1222     1941
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)               1175     1764
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)               1157     1837
## cor(Intercept,evidence:trial)                             1630     2562
## cor(evidence,evidence:trial)                              1473     2429
## cor(meansTRUE,evidence:trial)                              719     1550
## cor(sd_diff15,evidence:trial)                              906     1955
## cor(trial,evidence:trial)                                 1129     1771
## cor(evidence:meansTRUE,evidence:trial)                     951     1883
## cor(evidence:sd_diff15,evidence:trial)                    1016     1839
## cor(meansTRUE:sd_diff15,evidence:trial)                    732     1100
## cor(Intercept,evidence:meansTRUE:sd_diff15)               2066     2232
## cor(evidence,evidence:meansTRUE:sd_diff15)                2059     2239
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)               1666     2221
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)               1669     2142
## cor(trial,evidence:meansTRUE:sd_diff15)                   1581     2132
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)      1411     1993
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)      1297     2065
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)      845     1186
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)          1405     1986
## 
## Population-Level Effects: 
##                                                                 Estimate
## Intercept                                                           0.32
## evidence                                                            2.14
## meansTRUE                                                          -0.39
## sd_diff15                                                           1.06
## conditionHOPs                                                      -0.24
## conditionintervals                                                 -0.33
## conditionQDPs                                                       0.31
## start_meansTRUE                                                    -0.48
## trial                                                               1.25
## evidence:meansTRUE                                                 -0.14
## evidence:sd_diff15                                                  0.59
## meansTRUE:sd_diff15                                                 0.64
## evidence:conditionHOPs                                             -0.19
## evidence:conditionintervals                                        -0.19
## evidence:conditionQDPs                                              0.32
## meansTRUE:conditionHOPs                                            -0.00
## meansTRUE:conditionintervals                                       -0.01
## meansTRUE:conditionQDPs                                            -0.33
## sd_diff15:conditionHOPs                                             0.48
## sd_diff15:conditionintervals                                        0.37
## sd_diff15:conditionQDPs                                             0.10
## evidence:start_meansTRUE                                           -0.48
## meansTRUE:start_meansTRUE                                           0.47
## sd_diff15:start_meansTRUE                                           0.51
## conditionHOPs:start_meansTRUE                                      -0.39
## conditionintervals:start_meansTRUE                                 -0.31
## conditionQDPs:start_meansTRUE                                       0.20
## evidence:trial                                                      1.70
## conditionHOPs:trial                                                -0.02
## conditionintervals:trial                                            0.65
## conditionQDPs:trial                                                 0.33
## evidence:meansTRUE:sd_diff15                                        0.03
## evidence:meansTRUE:conditionHOPs                                   -0.26
## evidence:meansTRUE:conditionintervals                               0.17
## evidence:meansTRUE:conditionQDPs                                    0.06
## evidence:sd_diff15:conditionHOPs                                    0.12
## evidence:sd_diff15:conditionintervals                               0.36
## evidence:sd_diff15:conditionQDPs                                    0.23
## meansTRUE:sd_diff15:conditionHOPs                                  -0.54
## meansTRUE:sd_diff15:conditionintervals                              0.54
## meansTRUE:sd_diff15:conditionQDPs                                   0.26
## evidence:meansTRUE:start_meansTRUE                                  0.39
## evidence:sd_diff15:start_meansTRUE                                  0.22
## meansTRUE:sd_diff15:start_meansTRUE                                -0.06
## evidence:conditionHOPs:start_meansTRUE                             -0.36
## evidence:conditionintervals:start_meansTRUE                         0.11
## evidence:conditionQDPs:start_meansTRUE                             -0.11
## meansTRUE:conditionHOPs:start_meansTRUE                             0.17
## meansTRUE:conditionintervals:start_meansTRUE                        0.14
## meansTRUE:conditionQDPs:start_meansTRUE                             0.00
## sd_diff15:conditionHOPs:start_meansTRUE                             0.13
## sd_diff15:conditionintervals:start_meansTRUE                       -0.33
## sd_diff15:conditionQDPs:start_meansTRUE                            -0.24
## evidence:conditionHOPs:trial                                       -0.44
## evidence:conditionintervals:trial                                   0.68
## evidence:conditionQDPs:trial                                        0.40
## evidence:meansTRUE:sd_diff15:conditionHOPs                         -0.47
## evidence:meansTRUE:sd_diff15:conditionintervals                     0.25
## evidence:meansTRUE:sd_diff15:conditionQDPs                         -0.06
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        0.31
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    0.55
## evidence:meansTRUE:conditionintervals:start_meansTRUE              -0.02
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    0.15
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    0.10
## evidence:sd_diff15:conditionintervals:start_meansTRUE              -0.40
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                   -0.02
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   0.35
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE             -0.26
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                  -0.17
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          0.25
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     0.29
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.06
##                                                                 Est.Error
## Intercept                                                            0.17
## evidence                                                             0.15
## meansTRUE                                                            0.18
## sd_diff15                                                            0.16
## conditionHOPs                                                        0.24
## conditionintervals                                                   0.23
## conditionQDPs                                                        0.23
## start_meansTRUE                                                      0.22
## trial                                                                0.23
## evidence:meansTRUE                                                   0.17
## evidence:sd_diff15                                                   0.16
## meansTRUE:sd_diff15                                                  0.19
## evidence:conditionHOPs                                               0.19
## evidence:conditionintervals                                          0.19
## evidence:conditionQDPs                                               0.20
## meansTRUE:conditionHOPs                                              0.25
## meansTRUE:conditionintervals                                         0.24
## meansTRUE:conditionQDPs                                              0.26
## sd_diff15:conditionHOPs                                              0.21
## sd_diff15:conditionintervals                                         0.21
## sd_diff15:conditionQDPs                                              0.21
## evidence:start_meansTRUE                                             0.19
## meansTRUE:start_meansTRUE                                            0.23
## sd_diff15:start_meansTRUE                                            0.20
## conditionHOPs:start_meansTRUE                                        0.31
## conditionintervals:start_meansTRUE                                   0.30
## conditionQDPs:start_meansTRUE                                        0.31
## evidence:trial                                                       0.22
## conditionHOPs:trial                                                  0.31
## conditionintervals:trial                                             0.32
## conditionQDPs:trial                                                  0.32
## evidence:meansTRUE:sd_diff15                                         0.21
## evidence:meansTRUE:conditionHOPs                                     0.22
## evidence:meansTRUE:conditionintervals                                0.23
## evidence:meansTRUE:conditionQDPs                                     0.23
## evidence:sd_diff15:conditionHOPs                                     0.20
## evidence:sd_diff15:conditionintervals                                0.21
## evidence:sd_diff15:conditionQDPs                                     0.21
## meansTRUE:sd_diff15:conditionHOPs                                    0.25
## meansTRUE:sd_diff15:conditionintervals                               0.26
## meansTRUE:sd_diff15:conditionQDPs                                    0.25
## evidence:meansTRUE:start_meansTRUE                                   0.21
## evidence:sd_diff15:start_meansTRUE                                   0.20
## meansTRUE:sd_diff15:start_meansTRUE                                  0.23
## evidence:conditionHOPs:start_meansTRUE                               0.27
## evidence:conditionintervals:start_meansTRUE                          0.27
## evidence:conditionQDPs:start_meansTRUE                               0.28
## meansTRUE:conditionHOPs:start_meansTRUE                              0.32
## meansTRUE:conditionintervals:start_meansTRUE                         0.31
## meansTRUE:conditionQDPs:start_meansTRUE                              0.33
## sd_diff15:conditionHOPs:start_meansTRUE                              0.28
## sd_diff15:conditionintervals:start_meansTRUE                         0.27
## sd_diff15:conditionQDPs:start_meansTRUE                              0.28
## evidence:conditionHOPs:trial                                         0.29
## evidence:conditionintervals:trial                                    0.30
## evidence:conditionQDPs:trial                                         0.31
## evidence:meansTRUE:sd_diff15:conditionHOPs                           0.26
## evidence:meansTRUE:sd_diff15:conditionintervals                      0.26
## evidence:meansTRUE:sd_diff15:conditionQDPs                           0.27
## evidence:meansTRUE:sd_diff15:start_meansTRUE                         0.23
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                     0.29
## evidence:meansTRUE:conditionintervals:start_meansTRUE                0.31
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                     0.31
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                     0.27
## evidence:sd_diff15:conditionintervals:start_meansTRUE                0.27
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                     0.28
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                    0.32
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE               0.32
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                    0.32
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE           0.32
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE      0.33
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE           0.33
##                                                                 l-95% CI
## Intercept                                                          -0.01
## evidence                                                            1.85
## meansTRUE                                                          -0.74
## sd_diff15                                                           0.76
## conditionHOPs                                                      -0.71
## conditionintervals                                                 -0.79
## conditionQDPs                                                      -0.14
## start_meansTRUE                                                    -0.92
## trial                                                               0.80
## evidence:meansTRUE                                                 -0.46
## evidence:sd_diff15                                                  0.28
## meansTRUE:sd_diff15                                                 0.27
## evidence:conditionHOPs                                             -0.56
## evidence:conditionintervals                                        -0.57
## evidence:conditionQDPs                                             -0.08
## meansTRUE:conditionHOPs                                            -0.52
## meansTRUE:conditionintervals                                       -0.49
## meansTRUE:conditionQDPs                                            -0.82
## sd_diff15:conditionHOPs                                             0.06
## sd_diff15:conditionintervals                                       -0.03
## sd_diff15:conditionQDPs                                            -0.32
## evidence:start_meansTRUE                                           -0.87
## meansTRUE:start_meansTRUE                                           0.02
## sd_diff15:start_meansTRUE                                           0.13
## conditionHOPs:start_meansTRUE                                      -0.98
## conditionintervals:start_meansTRUE                                 -0.90
## conditionQDPs:start_meansTRUE                                      -0.38
## evidence:trial                                                      1.29
## conditionHOPs:trial                                                -0.61
## conditionintervals:trial                                            0.00
## conditionQDPs:trial                                                -0.29
## evidence:meansTRUE:sd_diff15                                       -0.38
## evidence:meansTRUE:conditionHOPs                                   -0.70
## evidence:meansTRUE:conditionintervals                              -0.28
## evidence:meansTRUE:conditionQDPs                                   -0.41
## evidence:sd_diff15:conditionHOPs                                   -0.27
## evidence:sd_diff15:conditionintervals                              -0.04
## evidence:sd_diff15:conditionQDPs                                   -0.16
## meansTRUE:sd_diff15:conditionHOPs                                  -1.02
## meansTRUE:sd_diff15:conditionintervals                              0.02
## meansTRUE:sd_diff15:conditionQDPs                                  -0.24
## evidence:meansTRUE:start_meansTRUE                                 -0.03
## evidence:sd_diff15:start_meansTRUE                                 -0.17
## meansTRUE:sd_diff15:start_meansTRUE                                -0.51
## evidence:conditionHOPs:start_meansTRUE                             -0.87
## evidence:conditionintervals:start_meansTRUE                        -0.41
## evidence:conditionQDPs:start_meansTRUE                             -0.66
## meansTRUE:conditionHOPs:start_meansTRUE                            -0.48
## meansTRUE:conditionintervals:start_meansTRUE                       -0.47
## meansTRUE:conditionQDPs:start_meansTRUE                            -0.63
## sd_diff15:conditionHOPs:start_meansTRUE                            -0.41
## sd_diff15:conditionintervals:start_meansTRUE                       -0.85
## sd_diff15:conditionQDPs:start_meansTRUE                            -0.80
## evidence:conditionHOPs:trial                                       -1.00
## evidence:conditionintervals:trial                                   0.11
## evidence:conditionQDPs:trial                                       -0.20
## evidence:meansTRUE:sd_diff15:conditionHOPs                         -0.97
## evidence:meansTRUE:sd_diff15:conditionintervals                    -0.26
## evidence:meansTRUE:sd_diff15:conditionQDPs                         -0.56
## evidence:meansTRUE:sd_diff15:start_meansTRUE                       -0.15
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                   -0.01
## evidence:meansTRUE:conditionintervals:start_meansTRUE              -0.61
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                   -0.47
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                   -0.42
## evidence:sd_diff15:conditionintervals:start_meansTRUE              -0.93
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                   -0.54
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                  -0.27
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE             -0.88
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                  -0.82
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE         -0.37
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE    -0.35
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.69
##                                                                 u-95% CI Rhat
## Intercept                                                           0.65 1.00
## evidence                                                            2.43 1.00
## meansTRUE                                                          -0.03 1.00
## sd_diff15                                                           1.36 1.00
## conditionHOPs                                                       0.20 1.00
## conditionintervals                                                  0.12 1.00
## conditionQDPs                                                       0.77 1.00
## start_meansTRUE                                                    -0.05 1.00
## trial                                                               1.69 1.00
## evidence:meansTRUE                                                  0.19 1.00
## evidence:sd_diff15                                                  0.90 1.00
## meansTRUE:sd_diff15                                                 1.01 1.00
## evidence:conditionHOPs                                              0.20 1.00
## evidence:conditionintervals                                         0.18 1.00
## evidence:conditionQDPs                                              0.71 1.00
## meansTRUE:conditionHOPs                                             0.48 1.00
## meansTRUE:conditionintervals                                        0.46 1.00
## meansTRUE:conditionQDPs                                             0.18 1.00
## sd_diff15:conditionHOPs                                             0.90 1.00
## sd_diff15:conditionintervals                                        0.80 1.00
## sd_diff15:conditionQDPs                                             0.51 1.00
## evidence:start_meansTRUE                                           -0.10 1.00
## meansTRUE:start_meansTRUE                                           0.93 1.00
## sd_diff15:start_meansTRUE                                           0.90 1.00
## conditionHOPs:start_meansTRUE                                       0.20 1.00
## conditionintervals:start_meansTRUE                                  0.26 1.00
## conditionQDPs:start_meansTRUE                                       0.82 1.00
## evidence:trial                                                      2.14 1.00
## conditionHOPs:trial                                                 0.59 1.00
## conditionintervals:trial                                            1.26 1.00
## conditionQDPs:trial                                                 0.95 1.00
## evidence:meansTRUE:sd_diff15                                        0.44 1.00
## evidence:meansTRUE:conditionHOPs                                    0.18 1.00
## evidence:meansTRUE:conditionintervals                               0.63 1.00
## evidence:meansTRUE:conditionQDPs                                    0.51 1.00
## evidence:sd_diff15:conditionHOPs                                    0.52 1.00
## evidence:sd_diff15:conditionintervals                               0.76 1.00
## evidence:sd_diff15:conditionQDPs                                    0.63 1.00
## meansTRUE:sd_diff15:conditionHOPs                                  -0.04 1.00
## meansTRUE:sd_diff15:conditionintervals                              1.04 1.00
## meansTRUE:sd_diff15:conditionQDPs                                   0.77 1.00
## evidence:meansTRUE:start_meansTRUE                                  0.81 1.00
## evidence:sd_diff15:start_meansTRUE                                  0.60 1.00
## meansTRUE:sd_diff15:start_meansTRUE                                 0.39 1.00
## evidence:conditionHOPs:start_meansTRUE                              0.15 1.00
## evidence:conditionintervals:start_meansTRUE                         0.63 1.00
## evidence:conditionQDPs:start_meansTRUE                              0.42 1.00
## meansTRUE:conditionHOPs:start_meansTRUE                             0.81 1.00
## meansTRUE:conditionintervals:start_meansTRUE                        0.73 1.00
## meansTRUE:conditionQDPs:start_meansTRUE                             0.65 1.00
## sd_diff15:conditionHOPs:start_meansTRUE                             0.70 1.00
## sd_diff15:conditionintervals:start_meansTRUE                        0.21 1.00
## sd_diff15:conditionQDPs:start_meansTRUE                             0.30 1.00
## evidence:conditionHOPs:trial                                        0.11 1.00
## evidence:conditionintervals:trial                                   1.24 1.00
## evidence:conditionQDPs:trial                                        0.99 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs                          0.03 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals                     0.78 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs                          0.48 1.00
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        0.77 1.00
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    1.14 1.00
## evidence:meansTRUE:conditionintervals:start_meansTRUE               0.57 1.00
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    0.74 1.00
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    0.64 1.00
## evidence:sd_diff15:conditionintervals:start_meansTRUE               0.13 1.00
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                    0.53 1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   0.95 1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE              0.37 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                   0.45 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          0.90 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     0.93 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          0.57 1.00
##                                                                 Bulk_ESS
## Intercept                                                           1429
## evidence                                                            1545
## meansTRUE                                                           2016
## sd_diff15                                                           1923
## conditionHOPs                                                       1342
## conditionintervals                                                  1586
## conditionQDPs                                                       1573
## start_meansTRUE                                                     1554
## trial                                                               2108
## evidence:meansTRUE                                                  2159
## evidence:sd_diff15                                                  1469
## meansTRUE:sd_diff15                                                 2018
## evidence:conditionHOPs                                              1723
## evidence:conditionintervals                                         1779
## evidence:conditionQDPs                                              1988
## meansTRUE:conditionHOPs                                             2214
## meansTRUE:conditionintervals                                        2339
## meansTRUE:conditionQDPs                                             2285
## sd_diff15:conditionHOPs                                             2062
## sd_diff15:conditionintervals                                        2044
## sd_diff15:conditionQDPs                                             2335
## evidence:start_meansTRUE                                            1475
## meansTRUE:start_meansTRUE                                           2105
## sd_diff15:start_meansTRUE                                           2231
## conditionHOPs:start_meansTRUE                                       1633
## conditionintervals:start_meansTRUE                                  1599
## conditionQDPs:start_meansTRUE                                       2063
## evidence:trial                                                      2238
## conditionHOPs:trial                                                 2193
## conditionintervals:trial                                            2503
## conditionQDPs:trial                                                 1974
## evidence:meansTRUE:sd_diff15                                        2127
## evidence:meansTRUE:conditionHOPs                                    2250
## evidence:meansTRUE:conditionintervals                               2190
## evidence:meansTRUE:conditionQDPs                                    2319
## evidence:sd_diff15:conditionHOPs                                    2002
## evidence:sd_diff15:conditionintervals                               2281
## evidence:sd_diff15:conditionQDPs                                    2249
## meansTRUE:sd_diff15:conditionHOPs                                   2309
## meansTRUE:sd_diff15:conditionintervals                              2063
## meansTRUE:sd_diff15:conditionQDPs                                   2446
## evidence:meansTRUE:start_meansTRUE                                  2036
## evidence:sd_diff15:start_meansTRUE                                  1712
## meansTRUE:sd_diff15:start_meansTRUE                                 2169
## evidence:conditionHOPs:start_meansTRUE                              1732
## evidence:conditionintervals:start_meansTRUE                         2067
## evidence:conditionQDPs:start_meansTRUE                              2015
## meansTRUE:conditionHOPs:start_meansTRUE                             2319
## meansTRUE:conditionintervals:start_meansTRUE                        2249
## meansTRUE:conditionQDPs:start_meansTRUE                             2582
## sd_diff15:conditionHOPs:start_meansTRUE                             2280
## sd_diff15:conditionintervals:start_meansTRUE                        2356
## sd_diff15:conditionQDPs:start_meansTRUE                             2219
## evidence:conditionHOPs:trial                                        2285
## evidence:conditionintervals:trial                                   2570
## evidence:conditionQDPs:trial                                        2566
## evidence:meansTRUE:sd_diff15:conditionHOPs                          2340
## evidence:meansTRUE:sd_diff15:conditionintervals                     2426
## evidence:meansTRUE:sd_diff15:conditionQDPs                          2329
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        2358
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    2310
## evidence:meansTRUE:conditionintervals:start_meansTRUE               2305
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    2419
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    2203
## evidence:sd_diff15:conditionintervals:start_meansTRUE               2395
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                    2281
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   2356
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE              2305
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                   2554
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          2346
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     2530
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          2597
##                                                                 Tail_ESS
## Intercept                                                           1758
## evidence                                                            2074
## meansTRUE                                                           2416
## sd_diff15                                                           2265
## conditionHOPs                                                       2047
## conditionintervals                                                  2207
## conditionQDPs                                                       2173
## start_meansTRUE                                                     2136
## trial                                                               2433
## evidence:meansTRUE                                                  2630
## evidence:sd_diff15                                                  2394
## meansTRUE:sd_diff15                                                 2498
## evidence:conditionHOPs                                              2246
## evidence:conditionintervals                                         2619
## evidence:conditionQDPs                                              2468
## meansTRUE:conditionHOPs                                             2687
## meansTRUE:conditionintervals                                        2831
## meansTRUE:conditionQDPs                                             2227
## sd_diff15:conditionHOPs                                             2642
## sd_diff15:conditionintervals                                        2430
## sd_diff15:conditionQDPs                                             2666
## evidence:start_meansTRUE                                            2022
## meansTRUE:start_meansTRUE                                           2441
## sd_diff15:start_meansTRUE                                           2593
## conditionHOPs:start_meansTRUE                                       2154
## conditionintervals:start_meansTRUE                                  2384
## conditionQDPs:start_meansTRUE                                       2285
## evidence:trial                                                      2523
## conditionHOPs:trial                                                 2642
## conditionintervals:trial                                            2719
## conditionQDPs:trial                                                 2380
## evidence:meansTRUE:sd_diff15                                        2720
## evidence:meansTRUE:conditionHOPs                                    2590
## evidence:meansTRUE:conditionintervals                               2416
## evidence:meansTRUE:conditionQDPs                                    2344
## evidence:sd_diff15:conditionHOPs                                    2636
## evidence:sd_diff15:conditionintervals                               2651
## evidence:sd_diff15:conditionQDPs                                    2587
## meansTRUE:sd_diff15:conditionHOPs                                   2694
## meansTRUE:sd_diff15:conditionintervals                              2815
## meansTRUE:sd_diff15:conditionQDPs                                   2630
## evidence:meansTRUE:start_meansTRUE                                  2772
## evidence:sd_diff15:start_meansTRUE                                  2370
## meansTRUE:sd_diff15:start_meansTRUE                                 2589
## evidence:conditionHOPs:start_meansTRUE                              2418
## evidence:conditionintervals:start_meansTRUE                         2451
## evidence:conditionQDPs:start_meansTRUE                              2441
## meansTRUE:conditionHOPs:start_meansTRUE                             2622
## meansTRUE:conditionintervals:start_meansTRUE                        2751
## meansTRUE:conditionQDPs:start_meansTRUE                             2659
## sd_diff15:conditionHOPs:start_meansTRUE                             2507
## sd_diff15:conditionintervals:start_meansTRUE                        2629
## sd_diff15:conditionQDPs:start_meansTRUE                             2444
## evidence:conditionHOPs:trial                                        2686
## evidence:conditionintervals:trial                                   2604
## evidence:conditionQDPs:trial                                        2720
## evidence:meansTRUE:sd_diff15:conditionHOPs                          2351
## evidence:meansTRUE:sd_diff15:conditionintervals                     2534
## evidence:meansTRUE:sd_diff15:conditionQDPs                          2632
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        2467
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    2664
## evidence:meansTRUE:conditionintervals:start_meansTRUE               2346
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    2532
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    2631
## evidence:sd_diff15:conditionintervals:start_meansTRUE               2362
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                    2598
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   2517
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE              2664
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                   2807
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          2244
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     2561
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          2603
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Model Comparison

Does adding block order to our previous model improve WAIC over our best model so far?

waic(m.m.logistic.r_means.sd.trial, m.m_order.logistic.r_means.sd.trial)
## Output of model 'm.m.logistic.r_means.sd.trial':
## 
## Computed from 3000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -7398.5  77.2
## p_waic      1843.9  30.2
## waic       14797.0 154.4
## 
## 961 (4.8%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.m_order.logistic.r_means.sd.trial':
## 
## Computed from 3000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -7389.7  77.2
## p_waic      1841.6  30.1
## waic       14779.3 154.5
## 
## 954 (4.8%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Model comparisons:
##                                     elpd_diff se_diff
## m.m_order.logistic.r_means.sd.trial  0.0       0.0   
## m.m.logistic.r_means.sd.trial       -8.8       6.4

Block order reduces WAIC somewhat, so we’ll run with this maximal model for the purpose of statistical inference.

Predictive Checks

Let’s take a look at the estimated psychometric functions for each worker.

model_check_df %>%
  group_by(evidence, worker_id, means, sd_diff, condition, trial, start_means) %>%
  add_fitted_draws(m.m_order.logistic.r_means.sd.trial, value = "pf", n = 500) %>%
  ggplot(aes(x = evidence, y = intervene, color = condition, fill = condition)) +
  geom_vline(xintercept = 0, size = 1, alpha = .3, color = "red", linetype = "dashed") + # utility optimal decision rule
  stat_lineribbon(aes(y = pf), .width = c(.95), alpha = .25) +
  geom_point(alpha = .15) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(model_df$evidence, c(0, 1)),
                  ylim = quantile(model_df$intervene, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) +
  facet_wrap(. ~ worker_id)

These look like the model is doing a better job of capturing variability for each individual worker.

What do the posteriors for just-noticable differences (JND) and points of subjective equality (PSE) look like? We’ll start by getting the slopes and intercepts of the linear model and using these to derive estimates of the JND and PSE for each fitted draw.

# get slopes from linear model
slopes_df <- model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(evidence = c(0, 1)) %>%
  add_fitted_draws(m.m_order.logistic.r_means.sd.trial, re_formula = NA, scale = "linear") %>%
  compare_levels(.value, by = evidence) %>%
  rename(slope = .value)

# get intercepts from linear model
intercepts_df <- model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(evidence = 0) %>%
  add_fitted_draws(m.m_order.logistic.r_means.sd.trial, re_formula = NA, scale = "linear") %>%
  rename(intercept = .value) 

# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>% 
  full_join(intercepts_df, by = c("means", "sd_diff", "condition", "trial", "start_means", ".draw")) %>%
  mutate(
    pse = -intercept / slope,
    jnd = qlogis(0.75) / slope
  )

First, let’s look at the estimates of JNDs per visualization, marginalizing across other manipulations.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = weighted.mean(jnd)) %>%
  ggplot(aes(x = jnd, group = condition, color = condition, fill = condition)) +
  geom_density(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) + 
  scale_color_brewer(type = "qual", palette = 2) +
  scale_x_continuous(expression(jnd), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior JND per visualization") +
  theme(panel.grid = element_blank())

It looks like users are most sensitive to evidence (i.e., JNDs are smaller) in the quantile dotplots condition and are least sensitive with HOPs.

What if we look at the interaction effect on JNDs, marginalizing out block and trial order?

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = weighted.mean(jnd)) %>%
  unite(vis_cond, condition, sd_diff) %>%
  # unite(means_present, means, start_means) %>%
  ggplot(aes(x = jnd, y = vis_cond)) +
  stat_halfeyeh() +
  labs(subtitle = "Posterior JND per condition") +
  theme_bw() +
  facet_grid(means ~ .)

Users are consistently more sensitive to evidence when uncertainty is high. Means seem to improve sensitivity a bit most notably with intervals.

Next, we’ll look at the point of subjective equality for each visualization condition, marginalizing out other manipulations.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = weighted.mean(pse)) %>%
  ggplot(aes(x = pse, group = condition, color = condition, fill = condition)) +
  geom_density(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) + 
  scale_color_brewer(type = "qual", palette = 2) +
  scale_x_continuous(expression(pse), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior PSE per visualization") +
  theme(panel.grid = element_blank())

It looks like the point of subjective equality is least biased with intervals, with increasing bias toward intervening (i.e., negative PSE) with HOPs, densities, and quantile dotplots, respectively.

What if we look at the interaction effect on PSE, marginalizing out block and trial order?

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = weighted.mean(pse)) %>%
  unite(vis_cond, condition, sd_diff) %>%
  ggplot(aes(x = pse, y = vis_cond)) +
  stat_halfeyeh() +
  labs(subtitle = "Posterior PSE per condition") +
  theme_bw() +
  facet_grid(means ~ .)

There are couple interesting things going on here:

  1. People intervene more than they should when uncertainty is high in all visualization conditions.
  2. Adding means biases PSE away from zero leading to less utility optimal decisions in all conditions except quantile dotplots.